본문으로 건너뛰기

#Large Language Models (LLMs)

166개의 포스트

[논문리뷰] Not only where, But when: Temporal Scheduling for RLVR

댓글 수 로딩 중

[논문리뷰] More Context, Larger Models, or Moral Knowledge? A Systematic Study of Schwartz Value Detection in Political Texts

댓글 수 로딩 중

[논문리뷰] OScaR: The Occam's Razor for Extreme KV Cache Quantization in LLMs and Beyond

댓글 수 로딩 중

[논문리뷰] RbtAct: Rebuttal as Supervision for Actionable Review Feedback Generation

댓글 수 로딩 중

[논문리뷰] CLIPO: Contrastive Learning in Policy Optimization Generalizes RLVR

댓글 수 로딩 중

[논문리뷰] MiniAppBench: Evaluating the Shift from Text to Interactive HTML Responses in LLM-Powered Assistants

댓글 수 로딩 중

[논문리뷰] Lost in Stories: Consistency Bugs in Long Story Generation by LLMs

댓글 수 로딩 중

[논문리뷰] Progressive Residual Warmup for Language Model Pretraining

댓글 수 로딩 중

[논문리뷰] HiMAP-Travel: Hierarchical Multi-Agent Planning for Long-Horizon Constrained Travel

댓글 수 로딩 중

[논문리뷰] DeepPresenter: Environment-Grounded Reflection for Agentic Presentation Generation

댓글 수 로딩 중

[논문리뷰] Qwen3-Coder-Next Technical Report

댓글 수 로딩 중

[논문리뷰] How Controllable Are Large Language Models? A Unified Evaluation across Behavioral Granularities

댓글 수 로딩 중

[논문리뷰] Tool-R0: Self-Evolving LLM Agents for Tool-Learning from Zero Data

댓글 수 로딩 중

[논문리뷰] CUDA Agent: Large-Scale Agentic RL for High-Performance CUDA Kernel Generation

댓글 수 로딩 중

[논문리뷰] AI Gamestore: Scalable, Open-Ended Evaluation of Machine General Intelligence with Human Games

댓글 수 로딩 중

[논문리뷰] SenTSR-Bench: Thinking with Injected Knowledge for Time-Series Reasoning

댓글 수 로딩 중

[논문리뷰] InnoEval: On Research Idea Evaluation as a Knowledge-Grounded, Multi-Perspective Reasoning Problem

댓글 수 로딩 중

[논문리뷰] A Critical Look at Targeted Instruction Selection: Disentangling What Matters (and What Doesn't)

댓글 수 로딩 중

[논문리뷰] Pretraining A Large Language Model using Distributed GPUs: A Memory-Efficient Decentralized Paradigm

댓글 수 로딩 중

[논문리뷰] Learning beyond Teacher: Generalized On-Policy Distillation with Reward Extrapolation

댓글 수 로딩 중

[논문리뷰] When to Memorize and When to Stop: Gated Recurrent Memory for Long-Context Reasoning

댓글 수 로딩 중

[논문리뷰] QP-OneModel: A Unified Generative LLM for Multi-Task Query Understanding in Xiaohongshu Search

댓글 수 로딩 중

[논문리뷰] G-LNS: Generative Large Neighborhood Search for LLM-Based Automatic Heuristic Design

댓글 수 로딩 중

[논문리뷰] CLI-Gym: Scalable CLI Task Generation via Agentic Environment Inversion

댓글 수 로딩 중

[논문리뷰] Dynamic Long Context Reasoning over Compressed Memory via End-to-End Reinforcement Learning

댓글 수 로딩 중

[논문리뷰] Chain of Mindset: Reasoning with Adaptive Cognitive Modes

댓글 수 로딩 중

[논문리뷰] LatentChem: From Textual CoT to Latent Thinking in Chemical Reasoning

댓글 수 로딩 중

[논문리뷰] On the Entropy Dynamics in Reinforcement Fine-Tuning of Large Language Models

댓글 수 로딩 중

[논문리뷰] V-Retrver: Evidence-Driven Agentic Reasoning for Universal Multimodal Retrieval

댓글 수 로딩 중

[논문리뷰] Multi-Task GRPO: Reliable LLM Reasoning Across Tasks

댓글 수 로딩 중

[논문리뷰] Scaling Embeddings Outperforms Scaling Experts in Language Models

댓글 수 로딩 중

[논문리뷰] Beyond Imitation: Reinforcement Learning for Active Latent Planning

댓글 수 로딩 중

[논문리뷰] Reinforcement Learning via Self-Distillation

댓글 수 로딩 중

[논문리뷰] Selective Steering: Norm-Preserving Control Through Discriminative Layer Selection

댓글 수 로딩 중

[논문리뷰] LongCat-Flash-Thinking-2601 Technical Report

댓글 수 로딩 중

[논문리뷰] Render-of-Thought: Rendering Textual Chain-of-Thought as Images for Visual Latent Reasoning

댓글 수 로딩 중

[논문리뷰] Facilitating Proactive and Reactive Guidance for Decision Making on the Web: A Design Probe with WebSeek

댓글 수 로딩 중

[논문리뷰] YaPO: Learnable Sparse Activation Steering Vectors for Domain Adaptation

댓글 수 로딩 중

[논문리뷰] Reasoning Models Generate Societies of Thought

댓글 수 로딩 중

[논문리뷰] EvasionBench: Detecting Evasive Answers in Financial Q&A via Multi-Model Consensus and LLM-as-Judge

댓글 수 로딩 중

[논문리뷰] Distribution-Aligned Sequence Distillation for Superior Long-CoT Reasoning

댓글 수 로딩 중

[논문리뷰] A^3-Bench: Benchmarking Memory-Driven Scientific Reasoning via Anchor and Attractor Activation

댓글 수 로딩 중

[논문리뷰] ET-Agent: Incentivizing Effective Tool-Integrated Reasoning Agent via Behavior Calibration

댓글 수 로딩 중

[논문리뷰] Dr. Zero: Self-Evolving Search Agents without Training Data

댓글 수 로딩 중

[논문리뷰] Controllable Memory Usage: Balancing Anchoring and Innovation in Long-Term Human-Agent Interaction

댓글 수 로딩 중

[논문리뷰] Entropy-Adaptive Fine-Tuning: Resolving Confident Conflicts to Mitigate Forgetting

댓글 수 로딩 중

[논문리뷰] X-MuTeST: A Multilingual Benchmark for Explainable Hate Speech Detection and A Novel LLM-consulted Explanation Framework

댓글 수 로딩 중

[논문리뷰] SWE-Lego: Pushing the Limits of Supervised Fine-tuning for Software Issue Resolving

댓글 수 로딩 중

[논문리뷰] AI Meets Brain: Memory Systems from Cognitive Neuroscience to Autonomous Agents

댓글 수 로딩 중

[논문리뷰] VL-LN Bench: Towards Long-horizon Goal-oriented Navigation with Active Dialogs

댓글 수 로딩 중

[논문리뷰] Coupling Experts and Routers in Mixture-of-Experts via an Auxiliary Loss

댓글 수 로딩 중

[논문리뷰] UCoder: Unsupervised Code Generation by Internal Probing of Large Language Models

댓글 수 로딩 중

[논문리뷰] Reasoning Palette: Modulating Reasoning via Latent Contextualization for Controllable Exploration for (V)LMs

댓글 수 로딩 중

[논문리뷰] SWE-Bench++: A Framework for the Scalable Generation of Software Engineering Benchmarks from Open-Source Repositories

댓글 수 로딩 중

[논문리뷰] SignRoundV2: Closing the Performance Gap in Extremely Low-Bit Post-Training Quantization for LLMs

댓글 수 로딩 중

[논문리뷰] REFLEX: Self-Refining Explainable Fact-Checking via Disentangling Truth into Style and Substance

댓글 수 로딩 중

[논문리뷰] On GRPO Collapse in Search-R1: The Lazy Likelihood-Displacement Death Spiral

댓글 수 로딩 중

[논문리뷰] Mitigating Catastrophic Forgetting in Target Language Adaptation of LLMs via Source-Shielded Updates

댓글 수 로딩 중

[논문리뷰] Stabilizing Reinforcement Learning with LLMs: Formulation and Practices

댓글 수 로딩 중

[논문리뷰] DeepSeekMath-V2: Towards Self-Verifiable Mathematical Reasoning

댓글 수 로딩 중

[논문리뷰] SSA: Sparse Sparse Attention by Aligning Full and Sparse Attention Outputs in Feature Space

댓글 수 로딩 중

[논문리뷰] Large Language Models Meet Extreme Multi-label Classification: Scaling and Multi-modal Framework

댓글 수 로딩 중

[논문리뷰] Genomic Next-Token Predictors are In-Context Learners

댓글 수 로딩 중

[논문리뷰] Black-Box On-Policy Distillation of Large Language Models

댓글 수 로딩 중

[논문리뷰] MathSE: Improving Multimodal Mathematical Reasoning via Self-Evolving Iterative Reflection and Reward-Guided Fine-Tuning

댓글 수 로딩 중

[논문리뷰] Beyond Fact Retrieval: Episodic Memory for RAG with Generative Semantic Workspaces

댓글 수 로딩 중

[논문리뷰] Routing Manifold Alignment Improves Generalization of Mixture-of-Experts LLMs

댓글 수 로딩 중

[논문리뷰] TabDSR: Decompose, Sanitize, and Reason for Complex Numerical Reasoning in Tabular Data

댓글 수 로딩 중

[논문리뷰] BRAINS: A Retrieval-Augmented System for Alzheimer's Detection and Monitoring

댓글 수 로딩 중

[논문리뷰] Towards Robust Mathematical Reasoning

댓글 수 로딩 중

[논문리뷰] Data-Efficient RLVR via Off-Policy Influence Guidance

댓글 수 로딩 중

[논문리뷰] MisSynth: Improving MISSCI Logical Fallacies Classification with Synthetic Data

댓글 수 로딩 중

[논문리뷰] Limits of Generalization in RLVR: Two Case Studies in Mathematical Reasoning

댓글 수 로딩 중

[논문리뷰] INT v.s. FP: A Comprehensive Study of Fine-Grained Low-bit Quantization Formats

댓글 수 로딩 중

[논문리뷰] Continuous Autoregressive Language Models

댓글 수 로딩 중

[논문리뷰] Video2Roleplay: A Multimodal Dataset and Framework for Video-Guided Role-playing Agents

댓글 수 로딩 중

[논문리뷰] Improving Context Fidelity via Native Retrieval-Augmented Reasoning

댓글 수 로딩 중

[논문리뷰] The Choice of Divergence: A Neglected Key to Mitigating Diversity Collapse in Reinforcement Learning with Verifiable Reward

댓글 수 로딩 중

[논문리뷰] WebExplorer: Explore and Evolve for Training Long-Horizon Web Agents

댓글 수 로딩 중

[논문리뷰] Towards a Unified View of Large Language Model Post-Training

댓글 수 로딩 중

[논문리뷰] Attributes as Textual Genes: Leveraging LLMs as Genetic Algorithm Simulators for Conditional Synthetic Data Generation

댓글 수 로딩 중

[논문리뷰] T2R-bench: A Benchmark for Generating Article-Level Reports from Real World Industrial Tables

댓글 수 로딩 중

[논문리뷰] Persuasion Dynamics in LLMs: Investigating Robustness and Adaptability in Knowledge and Safety with DuET-PD

댓글 수 로딩 중

[논문리뷰] OnGoal: Tracking and Visualizing Conversational Goals in Multi-Turn Dialogue with Large Language Models

댓글 수 로딩 중

[논문리뷰] Leveraging Large Language Models for Predictive Analysis of Human Misery

댓글 수 로딩 중

[논문리뷰] TopXGen: Topic-Diverse Parallel Data Generation for Low-Resource Machine Translation

댓글 수 로딩 중

[논문리뷰] GeRe: Towards Efficient Anti-Forgetting in Continual Learning of LLM via General Samples Replay

댓글 수 로딩 중

[논문리뷰] Feedback-Driven Tool-Use Improvements in Large Language Models via Automated Build Environments

댓글 수 로딩 중

[논문리뷰] Tool-integrated Reinforcement Learning for Repo Deep Search

댓글 수 로딩 중

[논문리뷰] The End of Manual Decoding: Towards Truly End-to-End Language Models

댓글 수 로딩 중

[논문리뷰] OmniLayout: Enabling Coarse-to-Fine Learning with LLMs for Universal Document Layout Generation

댓글 수 로딩 중

[논문리뷰] Magentic Marketplace: An Open-Source Environment for Studying Agentic Markets

댓글 수 로딩 중

[논문리뷰] Evolving Diagnostic Agents in a Virtual Clinical Environment

댓글 수 로딩 중

[논문리뷰] ChronoPlay: A Framework for Modeling Dual Dynamics and Authenticity in Game RAG Benchmarks

댓글 수 로딩 중

[논문리뷰] Generalization or Memorization: Dynamic Decoding for Mode Steering

댓글 수 로딩 중

[논문리뷰] FunReason-MT Technical Report: Overcoming the Complexity Barrier in Multi-Turn Function Calling

댓글 수 로딩 중

[논문리뷰] MoM: Mixtures of Scenario-Aware Document Memories for Retrieval-Augmented Generation Systems

댓글 수 로딩 중

[논문리뷰] Reasoning in Space via Grounding in the World

댓글 수 로딩 중

[논문리뷰] MATH-Beyond: A Benchmark for RL to Expand Beyond the Base Model

댓글 수 로딩 중

[논문리뷰] SAIL-Embedding Technical Report: Omni-modal Embedding Foundation Model

댓글 수 로딩 중

[논문리뷰] LLM Reasoning for Machine Translation: Synthetic Data Generation over Thinking Tokens

댓글 수 로딩 중

[논문리뷰] DITING: A Multi-Agent Evaluation Framework for Benchmarking Web Novel Translation

댓글 수 로딩 중

[논문리뷰] Webscale-RL: Automated Data Pipeline for Scaling RL Data to Pretraining Levels

댓글 수 로딩 중

[논문리뷰] Bridging Reasoning to Learning: Unmasking Illusions using Complexity Out of Distribution Generalization

댓글 수 로딩 중

[논문리뷰] Hybrid Reinforcement: When Reward Is Sparse, It's Better to Be Dense

댓글 수 로딩 중

[논문리뷰] First Try Matters: Revisiting the Role of Reflection in Reasoning Models

댓글 수 로딩 중

[논문리뷰] Native Hybrid Attention for Efficient Sequence Modeling

댓글 수 로딩 중

[논문리뷰] Cache-to-Cache: Direct Semantic Communication Between Large Language Models

댓글 수 로딩 중

[논문리뷰] In-the-Flow Agentic System Optimization for Effective Planning and Tool Use

댓글 수 로딩 중

[논문리뷰] Reinforce-Ada: An Adaptive Sampling Framework for Reinforce-Style LLM Training

댓글 수 로딩 중

[논문리뷰] ComProScanner: A multi-agent based framework for composition-property structured data extraction from scientific literature

댓글 수 로딩 중

[논문리뷰] AlphaOPT: Formulating Optimization Programs with Self-Improving LLM Experience Library

댓글 수 로딩 중

[논문리뷰] Executable Knowledge Graphs for Replicating AI Research

댓글 수 로딩 중

[논문리뷰] Rewiring Experts on the Fly:Continuous Rerouting for Better Online Adaptation in Mixture-of-Expert models

댓글 수 로딩 중

[논문리뷰] ERGO: Entropy-guided Resetting for Generation Optimization in Multi-turn Language Models

댓글 수 로딩 중

[논문리뷰] DeepSearch: Overcome the Bottleneck of Reinforcement Learning with Verifiable Rewards via Monte Carlo Tree Search

댓글 수 로딩 중

[논문리뷰] Beyond Log Likelihood: Probability-Based Objectives for Supervised Fine-Tuning across the Model Capability Continuum

댓글 수 로딩 중

[논문리뷰] OffTopicEval: When Large Language Models Enter the Wrong Chat, Almost Always!

댓글 수 로딩 중