본문으로 건너뛰기

#Supervised Fine-tuning (SFT)

20개의 포스트

[논문리뷰] RoboAlign: Learning Test-Time Reasoning for Language-Action Alignment in Vision-Language-Action Models

댓글 수 로딩 중

[논문리뷰] RbtAct: Rebuttal as Supervision for Actionable Review Feedback Generation

댓글 수 로딩 중

[논문리뷰] On Data Engineering for Scaling LLM Terminal Capabilities

댓글 수 로딩 중

[논문리뷰] X-Coder: Advancing Competitive Programming with Fully Synthetic Tasks, Solutions, and Tests

댓글 수 로딩 중

[논문리뷰] SWE-Lego: Pushing the Limits of Supervised Fine-tuning for Software Issue Resolving

댓글 수 로딩 중

[논문리뷰] Skywork-R1V4: Toward Agentic Multimodal Intelligence through Interleaved Thinking with Images and DeepResearch

댓글 수 로딩 중

[논문리뷰] Monet: Reasoning in Latent Visual Space Beyond Images and Language

댓글 수 로딩 중

[논문리뷰] Thinking-while-Generating: Interleaving Textual Reasoning throughout Visual Generation

댓글 수 로딩 중

[논문리뷰] WebExplorer: Explore and Evolve for Training Long-Horizon Web Agents

댓글 수 로딩 중

[논문리뷰] Beyond Solving Math Quiz: Evaluating the Ability of Large Reasoning Models to Ask for Information

댓글 수 로딩 중

[논문리뷰] InfiAlign: A Scalable and Sample-Efficient Framework for Aligning LLMs to Enhance Reasoning Capabilities

댓글 수 로딩 중

[논문리뷰] Tool-integrated Reinforcement Learning for Repo Deep Search

댓글 수 로딩 중

[논문리뷰] ReFIne: A Framework for Trustworthy Large Reasoning Models with Reliability, Faithfulness, and Interpretability

댓글 수 로딩 중

[논문리뷰] First Try Matters: Revisiting the Role of Reflection in Reasoning Models

댓글 수 로딩 중

[논문리뷰] Pushing on Multilingual Reasoning Models with Language-Mixed Chain-of-Thought

댓글 수 로딩 중

[논문리뷰] Video-LMM Post-Training: A Deep Dive into Video Reasoning with Large Multimodal Models

댓글 수 로딩 중

[논문리뷰] A Practitioner's Guide to Multi-turn Agentic Reinforcement Learning

댓글 수 로딩 중

[논문리뷰] Beyond Log Likelihood: Probability-Based Objectives for Supervised Fine-Tuning across the Model Capability Continuum

댓글 수 로딩 중