#Self-Distillation

28개의 포스트

[논문리뷰] GDSD: Reinforcement Learning as Guided Denoiser Self-Distillation for Diffusion Language Models

dLLMs는 기존의 Autoregressive Models(ARMs) 대비 효율적인 생성 성능을 제공하지만, 최적의 성능을 위해 필요한 강화학습(RL) 적용 시 정책 likelihood가 계산 불가능하다는 핵심적인 난관에 직면합니다.

#Review #Diffusion Language Models #Reinforcement Learning #Self-Distillation #Training-Inference Mismatch #Logit Matching

2026년 5월 31일

[논문리뷰] HINT-SD: Targeted Hindsight Self-Distillation for Long-Horizon Agents

Long-horizon 과업에서 에이전트가 Sparse Reward 환경 하에 학습할 때, 전통적인 탐색 방법은 최적의 Policy를 수렴하는 데 극도로 긴 시간이 소요됩니다.

#Review #Long-Horizon #Self-Distillation #Hindsight Experience Replay #Reinforcement Learning #Sparse Reward #Goal-Conditioned Policy

2026년 5월 24일

[논문리뷰] It Takes Two: Complementary Self-Distillation for Contextual Integrity in LLMs

본 논문은 LLM이 개인 비서(Personal Agent)로 활용되면서 발생하는 문맥적 프라이버시(Contextual Integrity) 문제를 해결하고자 합니다.

#Review #Contextual Integrity #Large Language Models #Self-Distillation #Product-of-Experts #Privacy-Utility Trade-off #Alignment

2026년 5월 20일

[논문리뷰] CEPO: RLVR Self-Distillation using Contrastive Evidence Policy Optimization

본 논문은 RLVR 환경에서 기존 정책 최적화 방식들이 겪는 불균일한 credit assignment 문제를 해결하기 위해 CEPO를 제안합니다. 기존의 GRPO와 같은 방식은 전체 시퀀스에 동일한 보상을 부여하여 결정적 추론 단계와 단순 서술 토큰을 구분하지 못하는 한계가 있습니다.

#Review #RLVR #Credit Assignment #Self-Distillation #Contrastive Learning #Policy Optimization #Information Leakage

2026년 5월 19일

[논문리뷰] Anti-Self-Distillation for Reasoning RL via Pointwise Mutual Information

본 논문은 LLM의 추론 능력을 강화하기 위한 on-policy self-distillation 기법이 수학적 추론 과제에서 일관된 성능 향상을 보이지 못하는 문제를 해결합니다.

#Review #Reinforcement Learning #Self-Distillation #Reasoning #Pointwise Mutual Information #LLM #GRPO #Jensen-Shannon Divergence

2026년 5월 19일

[논문리뷰] Post-Trained MoE Can Skip Half Experts via Self-Distillation

기존의 Dynamic MoE 연구들은 주로 모델을 밑바닥부터 재학습(from scratch)하거나 특정 작업에만 국한된 적응 방식을 취해왔습니다. 그러나 실제 현업에서는 이미 사전 학습 및 후속 학습(SFT, RL 등)이 완료된 Post-Trained MoE 모델을 활용하는 경우가 대부분입니다.

#Review #Mixture-of-Experts #Dynamic Inference #Self-Distillation #Zero-Expert Injection #Large Language Models #Model Adaptation

2026년 5월 18일

[논문리뷰] MixSD: Mixed Contextual Self-Distillation for Knowledge Injection

본 논문은 LLM에 새로운 지식을 주입할 때 발생하는 Catastrophic Forgetting 문제를 해결하고자 한다.

#Review #Knowledge Injection #Self-Distillation #Catastrophic Forgetting #Language Models #Distribution Alignment #Fine-tuning

2026년 5월 18일

[논문리뷰] Boosting Omni-Modal Language Models: Staged Post-Training with Visually Debiased Evaluation

본 논문은 현대의 Omni-modal LLM들이 기록하는 벤치마크 성능 향상이 진정한 모달리티 통합(integration)보다는 visual shortcut을 활용한 결과일 수 있다는 문제를 제기합니다.

#Review #Omni-modal LLM #Visual Leakage #OmniClean #Staged Post-Training #Self-Distillation #Reinforcement Learning

2026년 5월 14일

[논문리뷰] Adaptive Teacher Exposure for Self-Distillation in LLM Reasoning

본 논문은 LLM reasoning을 위한 On-Policy Self-Distillation (OPSD)에서 teacher-side exposure mismatch라는 간과된 bottleneck을 식별하고 해결하고자 합니다.

#Review #Self-Distillation #LLM Reasoning #Teacher Exposure #On-Policy #Adaptive Control #Reinforcement Learning #Beta-policy

2026년 5월 14일

[논문리뷰] The Many Faces of On-Policy Distillation: Pitfalls, Mechanisms, and Fixes

본 연구는 OPD와 OPSD가 시스템 프롬프트 및 지식 내재화에는 효과적이나, 최근 연구들에서 보고된 학습 불안정성(instability) 및 성능 저하(degradation) 문제를 근본적으로 규명하고자 합니다.

#Review #On-Policy Distillation #Self-Distillation #Language Models #Reverse-KL #Privileged Information #Optimization Stability #RLVR

2026년 5월 12일

[논문리뷰] UniSD: Towards a Unified Self-Distillation Framework for Large Language Models

본 논문은 기존 LLM의 post-training 과정이 외부 모델에 지나치게 의존함으로써 발생하는 비용 문제와 보안 위험을 해결하기 위해 UniSD라는 통일된 Self-Distillation 프레임워크를 제안합니다.

#Review #Self-Distillation #Large Language Models #On-Policy Learning #Supervision Reliability #Representation Alignment #Training Stability

2026년 5월 10일

[논문리뷰] Q-Zoom: Query-Aware Adaptive Perception for Efficient Multimodal Large Language Models

본 논문은 Q-Zoom이라는 2단계 적응형 프레임워크를 통해 시각적 인지 효율성을 개선한다. 첫 번째 단계인 Dynamic Gating Network는 consistency-aware 훈련 전략을 통해 고해상도 처리가 불필요한 쿼리를 식별하여 우회함으로써 불필요한 계산을 줄인다.

#Review #Multimodal Large Language Models #Efficient Perception #Dynamic Gating #Region Proposal Network #Self-Distillation #High-Resolution Adaptation

2026년 4월 8일

[논문리뷰] Unifying Group-Relative and Self-Distillation Policy Optimization via Sample Routing

저자들은 샘플의 학습 상태에 따라 적절한 최적화 방식을 할당하는 SRPO (Sample-Routed Policy Optimization)를 제안합니다 . SRPO는 정답 샘플에 대해서는 GRPO의 보상 정렬(reward-aligned) 강화를 적용하고, 오류 샘플 중 피드백 정보가 가용한 경우에는 SDPO의 정밀한 logit 수준 교정을 적용합니다.

#Review #RLVR #GRPO #SDPO #Sample Routing #Policy Optimization #Self-Distillation

2026년 4월 6일

[논문리뷰] Self-Distilled RLVR

본 논문은 OPSD 가 훈련 초기에는 성능 향상을 보이나, 곧 정보 누출(Information Leakage)로 인해 성능이 저하되는 원인을 규명하고 이를 해결하고자 합니다.

#Review #LLM Post-training #Reinforcement Learning #Self-Distillation #Information Asymmetry #Credit Assignment #RLVR

2026년 4월 5일

[논문리뷰] Embarrassingly Simple Self-Distillation Improves Code Generation

본 논문은 LLM의 코드 생성 능력을 향상하기 위해 외부의 고품질 인간 작성 데이터나 복잡한 강화 학습(RL) 파이프라인 없이 모델 스스로 개선될 수 있는지에 대한 의문을 제기합니다.

#Review #Self-Distillation #Code Generation #Large Language Models #Precision-Exploration Conflict #Supervised Fine-Tuning #Temperature Scaling #Truncation

2026년 4월 1일

[논문리뷰] Reading, Not Thinking: Understanding and Bridging the Modality Gap When Text Becomes Pixels in Multimodal LLMs

본 논문은 Multimodal Large Language Models (MLLMs) 가 텍스트를 이미지 형태로 처리할 때 발생하는 '모달리티 갭(modality gap)'을 체계적으로 진단하고 해결하는 것을 목표로 합니다.

#Review #Multimodal LLMs #Modality Gap #Visual Text Understanding #Error Analysis #Self-Distillation #Text-to-Image Conversion #Reasoning Collapse

2026년 3월 10일

[논문리뷰] On-Policy Self-Distillation for Reasoning Compression

본 논문은 대규모 언어 모델(LLM)이 추론 과정에서 생성하는 불필요하고 과도한 토큰으로 인한 비효율성 및 오류 누적 문제 를 해결하고자 합니다. 정답 데이터나 토큰 예산 같은 외부 제약 없이 모델 스스로 간결하게 추론하도록 학습시켜, 추론 과정의 압축과 동시에 정확도를 향상시키는 방법론을 제안합니다.

#Review #Reasoning Compression #Self-Distillation #On-Policy Learning #Large Language Models #Mathematical Reasoning #Knowledge Distillation #Efficient Inference

2026년 3월 5일

[논문리뷰] LaSER: Internalizing Explicit Reasoning into Latent Space for Dense Retrieval

본 논문은 강력한 추론 능력을 가진 LLM 기반 dense retriever 가 복잡한 쿼리에 대해 높은 지연 시간 없이 추론 능력을 활용하지 못하는 문제를 해결하고자 합니다.

#Review #Dense Retrieval #LLMs #Reasoning #Knowledge Distillation #Latent Space #Self-Distillation #Chain-of-Thought

2026년 3월 2일

[논문리뷰] Internalizing Meta-Experience into Memory for Guided Reinforcement Learning in Large Language Models

본 논문은 대규모 언어 모델(LLM)의 추론 능력 강화를 위한 강화 학습(RL) 기법인 RLVR(Reinforcement Learning with Verifiable Rewards)의 메타 학습 병목 현상 을 해결하는 것을 목표로 합니다.

#Review #Reinforcement Learning #Large Language Models #Meta-Learning #Error Attribution #Knowledge Internalization #Self-Distillation #Verifiable Rewards

2026년 2월 11일

[논문리뷰] daVinci-Agency: Unlocking Long-Horizon Agency Data-Efficiently

본 논문은 대규모 언어 모델(LLM)이 단기 작업에서 뛰어난 성능을 보임에도 불구하고, 실제와 같은 복잡한 장기 에이전트 워크플로우로 확장하는 데 필요한 고품질 훈련 데이터 부족 문제를 해결하고자 합니다.

#Review #Long-Horizon Agency #Data Synthesis #Pull Request Chains #Software Evolution #LLM Training #Agentic AI #Self-Distillation #Code Generation

2026년 2월 3일

[논문리뷰] THINKSAFE: Self-Generated Safety Alignment for Reasoning Models

본 논문은 강화 학습(RL) 기반의 추론 모델들이 복잡한 추론 태스크에서 성능을 극대화하는 과정에서 발생하는 '안전성 저하(safety tax)' 문제를 해결하고자 합니다.

#Review #Large Reasoning Models #Safety Alignment #Self-Distillation #Refusal Steering #Distributional Shift #Chain-of-Thought #Reinforcement Learning

2026년 2월 1일

[논문리뷰] Reinforcement Learning via Self-Distillation

대규모 언어 모델(LLM)의 강화 학습(RL) 후 훈련에서 발생하는 심각한 신용 할당(credit assignment) 병목 현상 을 해결하는 것이 목표입니다. 특히, 코드 생성이나 수학 문제 해결과 같은 검증 가능한 도메인 에서 스칼라 보상 이 아닌 풍부한 텍스트 피드백 을 활용하여 학습 효율성을 극대화하고자 합니다.

#Review #Reinforcement Learning #Self-Distillation #Large Language Models (LLMs)#Rich Feedback #Credit Assignment #Policy Optimization #RLHF #Code Generation #Test-Time Training

2026년 1월 28일

[논문리뷰] FFP-300K: Scaling First-Frame Propagation for Generalizable Video Editing

본 논문은 제어 가능한 비디오 편집 패러다임인 First-Frame Propagation (FFP) 의 주요 한계를 해결하고자 합니다.

#Review #Video Editing #First-Frame Propagation (FFP)#Large-Scale Dataset #Generative Models #Temporal Consistency #Spatio-Temporal RoPE #Self-Distillation

2026년 1월 6일

[논문리뷰] SkillFactory: Self-Distillation For Learning Cognitive Behaviors

본 논문은 기반 언어 모델(LLM)이 처음부터 갖추지 못한 인지적 스킬(예: 검증, 백트래킹, 재시도) 을 외부의 더 강력한 모델 없이 스스로 학습하도록 하는 SkillFactory 프레임워크를 제안합니다. 이를 통해 모델이 복잡한 추론 태스크에서 더 잘 일반화하고 견고성을 갖추도록 하는 것을 목표로 합니다.

#Review #Self-Distillation #Cognitive Skills #Reinforcement Learning #Supervised Fine-Tuning #Language Models #Reasoning #Verification #Retrying

2025년 12월 3일

[논문리뷰] Step-Audio-R1 Technical Report

오디오 언어 모델이 추론 과정을 거치면 성능이 저하되는 기존의 문제, 즉 '텍스트 대리 추론' 현상을 해결하고, 오디오 도메인에서 진정한 추론 능력을 성공적으로 활성화하는 것을 목표로 합니다. 이는 오디오 인텔리전스에 대한 심층적 사고의 이점을 입증하고자 합니다.

#Review #Audio Reasoning #Multimodal LLMs #Modality-Grounded Reasoning Distillation (MGRD)#Chain-of-Thought #Reinforcement Learning #Audio Understanding #Self-Distillation

2025년 11월 20일

[논문리뷰] Lyra: Generative 3D Scene Reconstruction via Video Diffusion Model Self-Distillation

본 논문의 핵심 목표는 실세계 다중 뷰 데이터 없이 단일 이미지 또는 비디오 입력으로부터 고품질의 3D 및 4D 장면을 생성하는 것입니다.

#Review #Generative AI #3D Scene Reconstruction #Video Diffusion Models #Self-Distillation #3D Gaussian Splatting #Dynamic 4D Generation #Monocular Input

2025년 9월 24일

[논문리뷰] Artificial Hippocampus Networks for Efficient Long-Context Modeling

본 논문은 RNN의 효율적인 고정 크기 메모리와 Transformer의 손실 없는 확장 가능 메모리 사이의 근본적인 트레이드오프를 해결하여, 장문 컨텍스트 모델링에서 효율성과 정확도를 동시에 달성하는 것을 목표로 합니다.

#Review #Long-Context Modeling #Transformer #RNN #Memory Management #Self-Distillation #Attention Mechanism #Artificial Hippocampus Networks #Cognitive Science

2025년 10월 9일

[논문리뷰] dParallel: Learnable Parallel Decoding for dLLMs

본 연구는 확산 언어 모델(dLLMs)이 가진 병렬 디코딩 잠재력 을 충분히 활용하지 못하는 문제, 즉 기존 dLLMs가 성능 유지를 위해 거의 토큰 길이만큼의 디코딩 스텝을 요구하는 병목 현상을 해결하는 것을 목표로 합니다.

#Review #Diffusion Language Models #Parallel Decoding #Inference Acceleration #Certainty Distillation #Self-Distillation #Masked Language Models #LLaDA

2025년 10월 1일