본문으로 건너뛰기

#Large Language Models

374개의 포스트

[논문리뷰] ThinkTwice: Jointly Optimizing Large Language Models for Reasoning and Self-Refinement

댓글 수 로딩 중

[논문리뷰] Paper Circle: An Open-source Multi-agent Research Discovery and Analysis Framework

댓글 수 로딩 중

[논문리뷰] MegaTrain: Full Precision Training of 100B+ Parameter Large Language Models on a Single GPU

댓글 수 로딩 중

[논문리뷰] LightThinker++: From Reasoning Compression to Memory Management

댓글 수 로딩 중

[논문리뷰] DataFlex: A Unified Framework for Data-Centric Dynamic Training of Large Language Models

댓글 수 로딩 중

[논문리뷰] Reasoning Shift: How Context Silently Shortens LLM Reasoning

댓글 수 로딩 중

[논문리뷰] MemRerank: Preference Memory for Personalized Product Reranking

댓글 수 로딩 중

[논문리뷰] Conv-FinRe: A Conversational and Longitudinal Benchmark for Utility-Grounded Financial Recommendation

댓글 수 로딩 중

[논문리뷰] WideSeek-R1: Exploring Width Scaling for Broad Information Seeking via Multi-Agent Reinforcement Learning

댓글 수 로딩 중

[논문리뷰] A Safety Report on GPT-5.2, Gemini 3 Pro, Qwen3-VL, Doubao 1.8, Grok 4.1 Fast, Nano Banana Pro, and Seedream 4.5

댓글 수 로딩 중

[논문리뷰] Are LLMs Vulnerable to Preference-Undermining Attacks (PUA)? A Factorial Analysis Methodology for Diagnosing the Trade-off between Preference Alignment and Real-World Validity

댓글 수 로딩 중

[논문리뷰] EpiQAL: Benchmarking Large Language Models in Epidemiological Question Answering for Enhanced Alignment and Reasoning

댓글 수 로딩 중

[논문리뷰] Let It Flow: Agentic Crafting on Rock and Roll, Building the ROME Model within an Open Agentic Learning Ecosystem

댓글 수 로딩 중

[논문리뷰] Multi-LLM Thematic Analysis with Dual Reliability Metrics: Combining Cohen's Kappa and Semantic Similarity for Qualitative Research Validation

댓글 수 로딩 중

[논문리뷰] Can LLMs Estimate Student Struggles? Human-AI Difficulty Alignment with Proficiency Simulation for Item Difficulty Prediction

댓글 수 로딩 중

[논문리뷰] Beyond Token-level Supervision: Unlocking the Potential of Decoding-based Regression via Reinforcement Learning

댓글 수 로딩 중

[논문리뷰] From Imitation to Discrimination: Toward A Generalized Curriculum Advantage Mechanism Enhancing Cross-Domain Reasoning Tasks

댓글 수 로딩 중

[논문리뷰] MiroThinker: Pushing the Performance Boundaries of Open-Source Research Agents via Model, Context, and Interactive Scaling

댓글 수 로딩 중

[논문리뷰] BAPO: Stabilizing Off-Policy Reinforcement Learning for LLMs via Balanced Policy Optimization with Adaptive Clipping

댓글 수 로딩 중

[논문리뷰] PokeeResearch: Effective Deep Research via Reinforcement Learning from AI Feedback and Robust Reasoning Scaffold

댓글 수 로딩 중

[논문리뷰] Hierarchical Frequency Tagging Probe (HFTP): A Unified Approach to Investigate Syntactic Structure Representations in Large Language Models and the Human Brain

댓글 수 로딩 중

[논문리뷰] Recycling Pretrained Checkpoints: Orthogonal Growth of Mixture-of-Experts for Efficient Large Language Model Pre-Training

댓글 수 로딩 중

[논문리뷰] Entropy Regularizing Activation: Boosting Continuous Control, Large Language Models, and Image Classification with Activation as Entropy Constraints

댓글 수 로딩 중

[논문리뷰] EvolProver: Advancing Automated Theorem Proving by Evolving Formalized Problems via Symmetry and Difficulty

댓글 수 로딩 중

[논문리뷰] DIWALI - Diversity and Inclusivity aWare cuLture specific Items for India: Dataset and Assessment of LLMs for Cultural Text Adaptation in Indian Context

댓글 수 로딩 중

[논문리뷰] Multimodal Reasoning for Science: Technical Report and 1st Place Solution to the ICML 2025 SeePhys Challenge

댓글 수 로딩 중

[논문리뷰] AMBEDKAR-A Multi-level Bias Elimination through a Decoding Approach with Knowledge Augmentation for Robust Constitutional Alignment of Language Models

댓글 수 로딩 중

[논문리뷰] TreePO: Bridging the Gap of Policy Optimization and Efficacy and Inference Efficiency with Heuristic Tree-based Modeling

댓글 수 로딩 중

[논문리뷰] Breaking the Exploration Bottleneck: Rubric-Scaffolded Reinforcement Learning for General LLM Reasoning

댓글 수 로딩 중

[논문리뷰] On-Policy RL Meets Off-Policy Experts: Harmonizing Supervised Fine-Tuning and Reinforcement Learning via Dynamic Weighting

댓글 수 로딩 중

[논문리뷰] Can LLM-Generated Textual Explanations Enhance Model Classification Performance? An Empirical Study

댓글 수 로딩 중

[논문리뷰] Visual Document Understanding and Question Answering: A Multi-Agent Collaboration Framework with Test-Time Scaling

댓글 수 로딩 중

[논문리뷰] RL-PLUS: Countering Capability Boundary Collapse of LLMs in Reinforcement Learning with Hybrid-policy Optimization

댓글 수 로딩 중