본문으로 건너뛰기

#Supervised Fine-Tuning

33개의 포스트

[논문리뷰] DRIFT: Decoupled Rollouts and Importance-Weighted Fine-Tuning for Efficient Multi-Turn Optimization

댓글 수 로딩 중

[논문리뷰] DR-Venus: Towards Frontier Edge-Scale Deep Research Agents with Only 10K Open Data

댓글 수 로딩 중

[논문리뷰] Structured Distillation of Web Agent Capabilities Enables Generalization

댓글 수 로딩 중

[논문리뷰] Embarrassingly Simple Self-Distillation Improves Code Generation

댓글 수 로딩 중

[논문리뷰] DLLM-Searcher: Adapting Diffusion Large Language Model for Search Agents

댓글 수 로딩 중

[논문리뷰] ProAct: Agentic Lookahead in Interactive Environments

댓글 수 로딩 중

[논문리뷰] Knowledge is Not Enough: Injecting RL Skills for Continual Adaptation

댓글 수 로딩 중

[논문리뷰] Nemotron 3 Nano: Open, Efficient Mixture-of-Experts Hybrid Mamba-Transformer Model for Agentic Reasoning

댓글 수 로딩 중

[논문리뷰] SkillFactory: Self-Distillation For Learning Cognitive Behaviors

댓글 수 로딩 중

[논문리뷰] WebVIA: A Web-based Vision-Language Agentic Framework for Interactive and Verifiable UI-to-Code Generation

댓글 수 로딩 중

[논문리뷰] RedOne 2.0: Rethinking Domain-specific LLM Post-Training in Social Networking Services

댓글 수 로딩 중

[논문리뷰] WebWeaver: Structuring Web-Scale Evidence with Dynamic Outlines for Open-Ended Deep Research

댓글 수 로딩 중

[논문리뷰] WebSailor-V2: Bridging the Chasm to Proprietary Agents via Synthetic Data and Scalable Reinforcement Learning

댓글 수 로딩 중

[논문리뷰] Inverse IFEval: Can LLMs Unlearn Stubborn Training Conventions to Follow Real Instructions?

댓글 수 로딩 중

[논문리뷰] On-Policy RL Meets Off-Policy Experts: Harmonizing Supervised Fine-Tuning and Reinforcement Learning via Dynamic Weighting

댓글 수 로딩 중

[논문리뷰] Aryabhata: An exam-focused language model for JEE Math

댓글 수 로딩 중

[논문리뷰] Reasoning Language Models for Root Cause Analysis in 5G Wireless Networks

댓글 수 로딩 중