본문으로 건너뛰기

#LLMs

118개의 포스트

[논문리뷰] You Only Need Minimal RLVR Training: Extrapolating LLMs via Rank-1 Trajectories

댓글 수 로딩 중

[논문리뷰] QiMeng-PRepair: Precise Code Repair via Edit-Aware Reward Optimization

댓글 수 로딩 중

[논문리뷰] Thinking to Recall: How Reasoning Unlocks Parametric Knowledge in LLMs

댓글 수 로딩 중

[논문리뷰] The Reasoning Trap -- Logical Reasoning as a Mechanistic Pathway to Situational Awareness

댓글 수 로딩 중

[논문리뷰] Untied Ulysses: Memory-Efficient Context Parallelism via Headwise Chunking

댓글 수 로딩 중

[논문리뷰] Sanity Checks for Sparse Autoencoders: Do SAEs Beat Random Baselines?

댓글 수 로딩 중

[논문리뷰] Less is Enough: Synthesizing Diverse Data in Feature Space of LLMs

댓글 수 로딩 중

[논문리뷰] FeatureBench: Benchmarking Agentic Coding for Complex Feature Development

댓글 수 로딩 중

[논문리뷰] Blockwise Advantage Estimation for Multi-Objective RL with Verifiable Rewards

댓글 수 로딩 중

[논문리뷰] Semantic Search over 9 Million Mathematical Theorems

댓글 수 로딩 중

[논문리뷰] Length-Unbiased Sequence Policy Optimization: Revealing and Controlling Response Length Variation in RLVR

댓글 수 로딩 중

[논문리뷰] Dr. Kernel: Reinforcement Learning Done Right for Triton Kernel Generations

댓글 수 로딩 중

[논문리뷰] CoBA-RL: Capability-Oriented Budget Allocation for Reinforcement Learning in LLMs

댓글 수 로딩 중

[논문리뷰] Harder Is Better: Boosting Mathematical Reasoning via Difficulty-Aware GRPO and Multi-Aspect Question Reformulation

댓글 수 로딩 중

[논문리뷰] Knowledge is Not Enough: Injecting RL Skills for Continual Adaptation

댓글 수 로딩 중

[논문리뷰] Spurious Rewards Paradox: Mechanistically Understanding How RLVR Activates Memorization Shortcuts in LLMs

댓글 수 로딩 중

[논문리뷰] MatchTIR: Fine-Grained Supervision for Tool-Integrated Reasoning via Bipartite Matching

댓글 수 로딩 중

[논문리뷰] LSRIF: Logic-Structured Reinforcement Learning for Instruction Following

댓글 수 로딩 중

[논문리뷰] Aligning Text, Code, and Vision: A Multi-Objective Reinforcement Learning Framework for Text-to-Visualization

댓글 수 로딩 중

[논문리뷰] Structured Episodic Event Memory

댓글 수 로딩 중

[논문리뷰] PaCoRe: Learning to Scale Test-Time Compute with Parallel Coordinated Reasoning

댓글 수 로딩 중

[논문리뷰] GenCtrl -- A Formal Controllability Toolkit for Generative Models

댓글 수 로딩 중

[논문리뷰] Mindscape-Aware Retrieval Augmented Generation for Improved Long Context Understanding

댓글 수 로딩 중

[논문리뷰] Probing Scientific General Intelligence of LLMs with Scientist-Aligned Workflows

댓글 수 로딩 중

[논문리뷰] NL2Repo-Bench: Towards Long-Horizon Repository Generation Evaluation of Coding Agents

댓글 수 로딩 중

[논문리뷰] CUDA-L2: Surpassing cuBLAS Performance for Matrix Multiplication through Reinforcement Learning

댓글 수 로딩 중

[논문리뷰] ATLAS: A High-Difficulty, Multidisciplinary Benchmark for Frontier Scientific Reasoning

댓글 수 로딩 중

[논문리뷰] MarsRL: Advancing Multi-Agent Reasoning System via Reinforcement Learning with Agentic Pipeline Parallelism

댓글 수 로딩 중

[논문리뷰] RLoop: An Self-Improving Framework for Reinforcement Learning with Iterative Policy Initialization

댓글 수 로딩 중

[논문리뷰] Multi-Step Knowledge Interaction Analysis via Rank-2 Subspace Disentanglement

댓글 수 로딩 중

[논문리뷰] SemCoT: Accelerating Chain-of-Thought Reasoning through Semantically-Aligned Implicit Tokens

댓글 수 로딩 중

[논문리뷰] Instruction-Following Evaluation in Function Calling for Large Language Models

댓글 수 로딩 중

[논문리뷰] SceneWeaver: All-in-One 3D Scene Synthesis with an Extensible and Self-Reflective Agent

댓글 수 로딩 중

[논문리뷰] SWE-Bench Pro: Can AI Agents Solve Long-Horizon Software Engineering Tasks?

댓글 수 로딩 중

[논문리뷰] From Uniform to Heterogeneous: Tailoring Policy Optimization to Every Token's Nature

댓글 수 로딩 중

[논문리뷰] EpiCache: Episodic KV Cache Management for Long Conversational Question Answering

댓글 수 로딩 중

[논문리뷰] RPG: A Repository Planning Graph for Unified and Scalable Codebase Generation

댓글 수 로딩 중

[논문리뷰] Evolving Language Models without Labels: Majority Drives Selection, Novelty Promotes Variation

댓글 수 로딩 중

[논문리뷰] WebSailor-V2: Bridging the Chasm to Proprietary Agents via Synthetic Data and Scalable Reinforcement Learning

댓글 수 로딩 중

[논문리뷰] GAPrune: Gradient-Alignment Pruning for Domain-Aware Embeddings

댓글 수 로딩 중

[논문리뷰] <think> So let's replace this phrase with insult... </think> Lessons learned from generation of toxic texts with LLMs

댓글 수 로딩 중

[논문리뷰] Reverse-Engineered Reasoning for Open-Ended Generation

댓글 수 로딩 중

[논문리뷰] Inverse IFEval: Can LLMs Unlearn Stubborn Training Conventions to Follow Real Instructions?

댓글 수 로딩 중

[논문리뷰] The Gold Medals in an Empty Room: Diagnosing Metalinguistic Reasoning in LLMs with Camlang

댓글 수 로딩 중

[논문리뷰] Metis: Training Large Language Models with Advanced Low-Bit Quantization

댓글 수 로딩 중

[논문리뷰] FlashAdventure: A Benchmark for GUI Agents Solving Full Story Arcs in Diverse Adventure Games

댓글 수 로딩 중

[논문리뷰] UltraMemV2: Memory Networks Scaling to 120B Parameters with Superior Long-Context Learning

댓글 수 로딩 중

[논문리뷰] ST-Raptor: LLM-Powered Semi-Structured Table Question Answering

댓글 수 로딩 중

[논문리뷰] CRISP: Persistent Concept Unlearning via Sparse Autoencoders

댓글 수 로딩 중

[논문리뷰] Chain-of-Agents: End-to-End Agent Foundation Models via Multi-Agent Distillation and Agentic RL

댓글 수 로딩 중

[논문리뷰] BiasGym: Fantastic Biases and How to Find (and Remove) Them

댓글 수 로딩 중

[논문리뷰] Deep Ignorance: Filtering Pretraining Data Builds Tamper-Resistant Safeguards into Open-Weight LLMs

댓글 수 로딩 중

[논문리뷰] Pruning the Unsurprising: Efficient Code Reasoning via First-Token Surprisal

댓글 수 로딩 중

[논문리뷰] MeshLLM: Empowering Large Language Models to Progressively Understand and Generate 3D Mesh

댓글 수 로딩 중

[논문리뷰] IFDECORATOR: Wrapping Instruction Following Reinforcement Learning with Verifiable Rewards

댓글 수 로딩 중

[논문리뷰] A Survey of Data Agents: Emerging Paradigm or Overstated Hype?

댓글 수 로딩 중

[논문리뷰] Large Language Models Do NOT Really Know What They Don't Know

댓글 수 로딩 중

[논문리뷰] Dr.LLM: Dynamic Layer Routing in LLMs

댓글 수 로딩 중

[논문리뷰] Training Dynamics Impact Post-Training Quantization Robustness

댓글 수 로딩 중

[논문리뷰] DRIFT: Learning from Abundant User Dissatisfaction in Real-World Preference Learning

댓글 수 로딩 중

[논문리뷰] Stabilizing MoE Reinforcement Learning by Aligning Training and Inference Routers

댓글 수 로딩 중

[논문리뷰] PIPer: On-Device Environment Setup via Online Reinforcement Learning

댓글 수 로딩 중