[논문리뷰] BandPO: Bridging Trust Regions and Ratio Clipping via Probability-Aware Bounds for LLM Reinforcement LearningarXiv에 게시된 'BandPO: Bridging Trust Regions and Ratio Clipping via Probability-Aware Bounds for LLM Reinforcement Learning' 논문에 대한 자세한 리뷰입니다.#Review#LLM Reinforcement Learning#Trust Region#Policy Optimization#Ratio Clipping#f-divergence#Entropy Regularization#Exploration#BandPO2026년 3월 8일댓글 수 로딩 중
[논문리뷰] STAPO: Stabilizing Reinforcement Learning for LLMs by Silencing Rare Spurious TokensZhilong Zheng이 arXiv에 게시한 'STAPO: Stabilizing Reinforcement Learning for LLMs by Silencing Rare Spurious Tokens' 논문에 대한 자세한 리뷰입니다.#Review#Reinforcement Learning#Large Language Models#Training Stability#Policy Optimization#Spurious Tokens#Entropy Regularization#Gradient Modulation2026년 2월 17일댓글 수 로딩 중
[논문리뷰] Entropy Regularizing Activation: Boosting Continuous Control, Large Language Models, and Image Classification with Activation as Entropy ConstraintsHuazhe Xu이 arXiv에 게시한 'Entropy Regularizing Activation: Boosting Continuous Control, Large Language Models, and Image Classification with Activation as Entropy Constraints' 논문에 대한 자세한 리뷰입니다.#Review#Entropy Regularization#Activation Functions#Continuous Control#Large Language Models#Image Classification#Reinforcement Learning#Policy Stochasticity#Entropy Constraints2025년 10월 10일댓글 수 로딩 중
[논문리뷰] TTRV: Test-Time Reinforcement Learning for Vision Language ModelsSerena Yeung-Levy이 arXiv에 게시한 'TTRV: Test-Time Reinforcement Learning for Vision Language Models' 논문에 대한 자세한 리뷰입니다.#Review#Vision-Language Models (VLMs)#Reinforcement Learning (RL)#Test-Time Adaptation#Unsupervised Learning#Image Recognition#Visual Question Answering (VQA)#Group Relative Policy Optimization (GRPO)#Entropy Regularization2025년 10월 9일댓글 수 로딩 중
[논문리뷰] EPO: Entropy-regularized Policy Optimization for LLM Agents Reinforcement LearningLi Yu-Jhe이 arXiv에 게시한 'EPO: Entropy-regularized Policy Optimization for LLM Agents Reinforcement Learning' 논문에 대한 자세한 리뷰입니다.#Review#LLM Agents#Reinforcement Learning#Entropy Regularization#Policy Optimization#Sparse Rewards#Multi-turn Environments#Exploration-Exploitation2025년 9월 29일댓글 수 로딩 중
[논문리뷰] Light-IF: Endowing LLMs with Generalizable Reasoning via Preview and Self-Checking for Complex Instruction FollowingLiang Xu이 arXiv에 게시한 'Light-IF: Endowing LLMs with Generalizable Reasoning via Preview and Self-Checking for Complex Instruction Following' 논문에 대한 자세한 리뷰입니다.#Review#LLMs#Instruction Following#Reasoning#Reinforcement Learning#Supervised Fine-tuning#Entropy Regularization#Self-Checking#Previewing2025년 8월 7일댓글 수 로딩 중