[논문리뷰] Efficient and Principled Scientific Discovery through Bayesian Optimization: A TutorialZhenzhi Tan이 arXiv에 게시한 'Efficient and Principled Scientific Discovery through Bayesian Optimization: A Tutorial' 논문에 대한 자세한 리뷰입니다.#Review#Bayesian Optimization#Scientific Discovery#Gaussian Process#Acquisition Function#Surrogate Model#Automated Experimentation#Sample Efficiency2026년 4월 2일댓글 수 로딩 중
[논문리뷰] MolmoPoint: Better Pointing for VLMs with Grounding TokensYue Yang이 arXiv에 게시한 'MolmoPoint: Better Pointing for VLMs with Grounding Tokens' 논문에 대한 자세한 리뷰입니다.#Review#Vision-Language Models#Grounding Tokens#Pointing#GUI Grounding#Video Grounding#Sample Efficiency2026년 3월 30일댓글 수 로딩 중
[논문리뷰] Bootstrapping Exploration with Group-Level Natural Language Feedback in Reinforcement LearningarXiv에 게시된 'Bootstrapping Exploration with Group-Level Natural Language Feedback in Reinforcement Learning' 논문에 대한 자세한 리뷰입니다.#Review#Reinforcement Learning#Large Language Models#Natural Language Feedback#Exploration#Group-Level Feedback#Self-Refinement#Sample Efficiency2026년 3월 11일댓글 수 로딩 중
[논문리뷰] Heterogeneous Agent Collaborative Reinforcement LearningarXiv에 게시된 'Heterogeneous Agent Collaborative Reinforcement Learning' 논문에 대한 자세한 리뷰입니다.#Review#Reinforcement Learning#Large Language Models#Multi-Agent Systems#Policy Optimization#Heterogeneous Agents#Sample Efficiency#Knowledge Transfer#RLVR2026년 3월 4일댓글 수 로딩 중
[논문리뷰] Less Noise, More Voice: Reinforcement Learning for Reasoning via Instruction PurificationarXiv에 게시된 'Less Noise, More Voice: Reinforcement Learning for Reasoning via Instruction Purification' 논문에 대한 자세한 리뷰입니다.#Review#Reinforcement Learning#LLM Reasoning#Instruction Purification#Interference Tokens#Sample Efficiency#Policy Optimization#Verifiable Rewards2026년 2월 3일댓글 수 로딩 중
[논문리뷰] SSL: Sweet Spot Learning for Differentiated Guidance in Agentic OptimizationBolin Ni이 arXiv에 게시한 'SSL: Sweet Spot Learning for Differentiated Guidance in Agentic Optimization' 논문에 대한 자세한 리뷰입니다.#Review#Reinforcement Learning#Reward Shaping#Agent Optimization#GUI Automation#Complex Reasoning#Sample Efficiency#Tiered Rewards2026년 2월 1일댓글 수 로딩 중
[논문리뷰] Spark: Strategic Policy-Aware Exploration via Dynamic Branching for Long-Horizon Agentic LearningShuai Zhang이 arXiv에 게시한 'Spark: Strategic Policy-Aware Exploration via Dynamic Branching for Long-Horizon Agentic Learning' 논문에 대한 자세한 리뷰입니다.#Review#Agentic AI#Reinforcement Learning#Long-Horizon Tasks#Dynamic Branching#Strategic Exploration#LLM Agents#Sample Efficiency#Policy Optimization2026년 1월 28일댓글 수 로딩 중
[논문리뷰] TreeGRPO: Tree-Advantage GRPO for Online RL Post-Training of Diffusion ModelsWeirui Ye이 arXiv에 게시한 'TreeGRPO: Tree-Advantage GRPO for Online RL Post-Training of Diffusion Models' 논문에 대한 자세한 리뷰입니다.#Review#Reinforcement Learning#Diffusion Models#Generative Models#Tree Search#Sample Efficiency#Credit Assignment#GRPO#Visual Generative Models2025년 12월 9일댓글 수 로딩 중
[논문리뷰] WMPO: World Model-based Policy Optimization for Vision-Language-Action ModelsarXiv에 게시된 'WMPO: World Model-based Policy Optimization for Vision-Language-Action Models' 논문에 대한 자세한 리뷰입니다.#Review#Vision-Language-Action (VLA)#Reinforcement Learning (RL)#Model-based RL#World Models#Policy Optimization#Robotics#Sample Efficiency#Self-correction2025년 11월 12일댓글 수 로딩 중
[논문리뷰] Information Gain-based Policy Optimization: A Simple and Effective Approach for Multi-Turn LLM AgentsarXiv에 게시된 'Information Gain-based Policy Optimization: A Simple and Effective Approach for Multi-Turn LLM Agents' 논문에 대한 자세한 리뷰입니다.#Review#LLM Agents#Reinforcement Learning#Multi-Turn Interactions#Reward Sparsity#Information Gain#Policy Optimization#Ground-Truth Awareness#Sample Efficiency2025년 10월 17일댓글 수 로딩 중
[논문리뷰] CurES: From Gradient Analysis to Efficient Curriculum Learning for Reasoning LLMsHengyi Cai이 arXiv에 게시한 'CurES: From Gradient Analysis to Efficient Curriculum Learning for Reasoning LLMs' 논문에 대한 자세한 리뷰입니다.#Review#Curriculum Learning#LLMs#Reasoning#Gradient Optimization#Reinforcement Learning#Bayesian Inference#Sample Efficiency2025년 10월 2일댓글 수 로딩 중
[논문리뷰] Residual Off-Policy RL for Finetuning Behavior Cloning PoliciesPieter Abbeel이 arXiv에 게시한 'Residual Off-Policy RL for Finetuning Behavior Cloning Policies' 논문에 대한 자세한 리뷰입니다.#Review#Reinforcement Learning (RL)#Behavior Cloning (BC)#Residual Learning#Off-Policy RL#Robot Manipulation#Real-World Robotics#High-DoF Systems#Sample Efficiency2025년 9월 26일댓글 수 로딩 중
[논문리뷰] InfiAlign: A Scalable and Sample-Efficient Framework for Aligning LLMs to Enhance Reasoning CapabilitiesZhijie Sang이 arXiv에 게시한 'InfiAlign: A Scalable and Sample-Efficient Framework for Aligning LLMs to Enhance Reasoning Capabilities' 논문에 대한 자세한 리뷰입니다.#Review#LLM Alignment#Reasoning#Data Curation#Supervised Fine-tuning (SFT)#Direct Preference Optimization (DPO)#Sample Efficiency#Scalability#Multi-dimensional Filtering2025년 8월 8일댓글 수 로딩 중