[논문리뷰] Revisiting On-Policy Distillation: Empirical Failure Modes and Simple FixesarXiv에 게시된 'Revisiting On-Policy Distillation: Empirical Failure Modes and Simple Fixes' 논문에 대한 자세한 리뷰입니다.#Review#On-policy Distillation#LLM Post-training#Sampled-token OPD#Variance Reduction#Local Support Matching#Truncated Reverse-KL#Top-p Rollout Sampling#Special Token Masking2026년 3월 26일댓글 수 로딩 중
[논문리뷰] VESPO: Variational Sequence-Level Soft Policy Optimization for Stable Off-Policy LLM TrainingarXiv에 게시된 'VESPO: Variational Sequence-Level Soft Policy Optimization for Stable Off-Policy LLM Training' 논문에 대한 자세한 리뷰입니다.#Review#Off-Policy RL#LLM Training#Importance Sampling#Variance Reduction#Variational Optimization#Policy Gradient#Sequence-Level Optimization#Reinforcement Learning2026년 2월 22일댓글 수 로딩 중
[논문리뷰] Online Causal Kalman Filtering for Stable and Effective Policy OptimizationarXiv에 게시된 'Online Causal Kalman Filtering for Stable and Effective Policy Optimization' 논문에 대한 자세한 리뷰입니다.#Review#Reinforcement Learning (RL)#Large Language Models (LLMs)#Policy Optimization#Importance Sampling (IS) Ratio#Kalman Filter#Variance Reduction#Math Reasoning2026년 2월 11일댓글 수 로딩 중
[논문리뷰] MARS-M: When Variance Reduction Meets MatricesarXiv에 게시된 'MARS-M: When Variance Reduction Meets Matrices' 논문에 대한 자세한 리뷰입니다.#Review#Variance Reduction#Matrix-based Optimizer#LLM Training#Deep Learning Optimization#Moonlight#MARS-M#Stochastic Gradient Descent2025년 10월 28일댓글 수 로딩 중
[논문리뷰] Reinforce-Ada: An Adaptive Sampling Framework for Reinforce-Style LLM TrainingarXiv에 게시된 'Reinforce-Ada: An Adaptive Sampling Framework for Reinforce-Style LLM Training' 논문에 대한 자세한 리뷰입니다.#Review#Reinforcement Learning (RL)#Large Language Models (LLMs)#Adaptive Sampling#Policy Gradient#Reward Optimization#Signal Collapse#Variance Reduction2025년 10월 7일댓글 수 로딩 중
[논문리뷰] ReSWD: ReSTIR'd, not shaken. Combining Reservoir Sampling and Sliced Wasserstein Distance for Variance ReductionarXiv에 게시된 'ReSWD: ReSTIR'd, not shaken. Combining Reservoir Sampling and Sliced Wasserstein Distance for Variance Reduction' 논문에 대한 자세한 리뷰입니다.#Review#Sliced Wasserstein Distance#Reservoir Sampling#Variance Reduction#Distribution Matching#Diffusion Guidance#Color Correction#Monte Carlo Estimation2025년 10월 2일댓글 수 로딩 중
[논문리뷰] Single-stream Policy OptimizationZihan Ding이 arXiv에 게시한 'Single-stream Policy Optimization' 논문에 대한 자세한 리뷰입니다.#Review#Reinforcement Learning#LLM Optimization#Policy Gradient#Variance Reduction#Adaptive Sampling#Scalability#Agentic Systems#RLVR2025년 9월 17일댓글 수 로딩 중