본문으로 건너뛰기

#Supervised Fine-tuning

58개의 포스트

[논문리뷰] LiteCoder-Terminal: Scaling Long-Horizon Terminal Environments for Learning Language Agents

댓글 수 로딩 중

[논문리뷰] Achieving Gold-Medal-Level Olympiad Reasoning via Simple and Unified Scaling

댓글 수 로딩 중

[논문리뷰] DIVE: Scaling Diversity in Agentic Task Synthesis for Generalizable Tool Use

댓글 수 로딩 중

[논문리뷰] DeepPresenter: Environment-Grounded Reflection for Agentic Presentation Generation

댓글 수 로딩 중

[논문리뷰] When Does RL Help Medical VLMs? Disentangling Vision, SFT, and RL Gains

댓글 수 로딩 중

[논문리뷰] CoVe: Training Interactive Tool-Use Agents via Constraint-Guided Verification

댓글 수 로딩 중

[논문리뷰] GUI-Libra: Training Native GUI Agents to Reason and Act with Action-aware Supervision and Partially Verifiable RL

댓글 수 로딩 중

[논문리뷰] RLinf-Co: Reinforcement Learning-Based Sim-Real Co-Training for VLA Models

댓글 수 로딩 중

[논문리뷰] SwimBird: Eliciting Switchable Reasoning Mode in Hybrid Autoregressive MLLMs

댓글 수 로딩 중

[논문리뷰] Pushing the Boundaries of Natural Reasoning: Interleaved Bonus from Formal-Logic Verification

댓글 수 로딩 중

[논문리뷰] Typhoon-S: Minimal Open Post-Training for Sovereign Large Language Models

댓글 수 로딩 중

[논문리뷰] VisGym: Diverse, Customizable, Scalable Environments for Multimodal Agents

댓글 수 로딩 중

[논문리뷰] Inference-Time Scaling of Verification: Self-Evolving Deep Research Agents via Test-Time Rubric-Guided Verification

댓글 수 로딩 중

[논문리뷰] Advances and Frontiers of LLM-based Issue Resolution in Software Engineering: A Comprehensive Survey

댓글 수 로딩 중

[논문리뷰] EpiCaR: Knowing What You Don't Know Matters for Better Reasoning in LLMs

댓글 수 로딩 중

[논문리뷰] SO-Bench: A Structural Output Evaluation of Multimodal LLMs

댓글 수 로딩 중

[논문리뷰] From Pixels to Feelings: Aligning MLLMs with Human Cognitive Perception of Images

댓글 수 로딩 중

[논문리뷰] OpenMMReasoner: Pushing the Frontiers for Multimodal Reasoning with an Open and General Recipe

댓글 수 로딩 중

[논문리뷰] Kandinsky 5.0: A Family of Foundation Models for Image and Video Generation

댓글 수 로딩 중

[논문리뷰] Motif 2 12.7B technical report

댓글 수 로딩 중

[논문리뷰] DRIVE: Data Curation Best Practices for Reinforcement Learning with Verifiable Reward in Competitive Code Generation

댓글 수 로딩 중

[논문리뷰] DeepEyesV2: Toward Agentic Multimodal Model

댓글 수 로딩 중

[논문리뷰] UME-R1: Exploring Reasoning-Driven Generative Multimodal Embeddings

댓글 수 로딩 중

[논문리뷰] Rank-GRPO: Training LLM-based Conversational Recommender Systems with Reinforcement Learning

댓글 수 로딩 중

[논문리뷰] SAIL-VL2 Technical Report

댓글 수 로딩 중

[논문리뷰] Klear-Reasoner: Advancing Reasoning Capability via Gradient-Preserving Clipping Policy Optimization

댓글 수 로딩 중

[논문리뷰] Detect Anything via Next Point Prediction

댓글 수 로딩 중

[논문리뷰] A Goal Without a Plan Is Just a Wish: Efficient and Effective Global Planner Training for Long-Horizon Agent Tasks

댓글 수 로딩 중

[논문리뷰] TaTToo: Tool-Grounded Thinking PRM for Test-Time Scaling in Tabular Reasoning

댓글 수 로딩 중

[논문리뷰] Judging with Confidence: Calibrating Autoraters to Preference Distributions

댓글 수 로딩 중

[논문리뷰] Front-Loading Reasoning: The Synergy between Pretraining and Post-Training Data

댓글 수 로딩 중

[논문리뷰] Directional Reasoning Injection for Fine-Tuning MLLMs

댓글 수 로딩 중

[논문리뷰] PIPer: On-Device Environment Setup via Online Reinforcement Learning

댓글 수 로딩 중

[논문리뷰] Infusing Theory of Mind into Socially Intelligent LLM Agents

댓글 수 로딩 중