[논문리뷰] UniGRPO: Unified Policy Optimization for Reasoning-Driven Visual GenerationarXiv에 게시된 'UniGRPO: Unified Policy Optimization for Reasoning-Driven Visual Generation' 논문에 대한 자세한 리뷰입니다.#Review#Unified Policy Optimization#Reinforcement Learning#Reasoning-Driven Generation#Interleaved Generation#Flow Matching#Markov Decision Process#Classifier-Free Guidance#Reward Hacking2026년 3월 24일댓글 수 로딩 중
[논문리뷰] Learning Unmasking Policies for Diffusion Language ModelsarXiv에 게시된 'Learning Unmasking Policies for Diffusion Language Models' 논문에 대한 자세한 리뷰입니다.#Review#Diffusion Language Models#Reinforcement Learning#Masked Diffusion#Sampling Policy#Inference Optimization#Markov Decision Process#Generative AI#Text Generation2025년 12월 10일댓글 수 로딩 중
[논문리뷰] Agent-R1: Training Powerful LLM Agents with End-to-End Reinforcement LearningYucong Luo이 arXiv에 게시한 'Agent-R1: Training Powerful LLM Agents with End-to-End Reinforcement Learning' 논문에 대한 자세한 리뷰입니다.#Review#LLM Agents#Reinforcement Learning#Markov Decision Process#Tool Use#Multi-turn Interaction#Policy Optimization#Reward Shaping#Agent Framework2025년 11월 18일댓글 수 로딩 중
[논문리뷰] DynaAct: Large Language Model Reasoning with Dynamic Action SpacesLingpeng Kong이 arXiv에 게시한 'DynaAct: Large Language Model Reasoning with Dynamic Action Spaces' 논문에 대한 자세한 리뷰입니다.#Review#Large Language Models#Sequential Reasoning#Action Space Construction#Submodular Optimization#Markov Decision Process#Monte Carlo Tree Search#Utility-Diversity Trade-off2025년 11월 11일댓글 수 로딩 중
[논문리뷰] IterResearch: Rethinking Long-Horizon Agents via Markovian State ReconstructionHaotian Xu이 arXiv에 게시한 'IterResearch: Rethinking Long-Horizon Agents via Markovian State Reconstruction' 논문에 대한 자세한 리뷰입니다.#Review#Long-Horizon Agents#Markov Decision Process#Workspace Reconstruction#Reinforcement Learning#Context Management#Iterative Deep Research#LLM Agents#Efficiency-Aware Policy Optimization2025년 11월 10일댓글 수 로딩 중
[논문리뷰] PhysVLM-AVR: Active Visual Reasoning for Multimodal Large Language Models in Physical EnvironmentsChaoyang Zhao이 arXiv에 게시한 'PhysVLM-AVR: Active Visual Reasoning for Multimodal Large Language Models in Physical Environments' 논문에 대한 자세한 리뷰입니다.#Review#Active Visual Reasoning#MLLM#Physical Environments#Partially Observable#Markov Decision Process#Chain-of-Thought#Embodied AI#CLEVR-AVR2025년 10월 27일댓글 수 로딩 중
[논문리뷰] Random Policy Valuation is Enough for LLM Reasoning with Verifiable RewardsBinxing Jiao이 arXiv에 게시한 'Random Policy Valuation is Enough for LLM Reasoning with Verifiable Rewards' 논문에 대한 자세한 리뷰입니다.#Review#Reinforcement Learning#LLM Reasoning#Policy Valuation#Markov Decision Process#Diversity#Math Reasoning#Verifiable Rewards2025년 9월 30일댓글 수 로딩 중
[논문리뷰] WebResearcher: Unleashing unbounded reasoning capability in Long-Horizon AgentsWenbiao Yin이 arXiv에 게시한 'WebResearcher: Unleashing unbounded reasoning capability in Long-Horizon Agents' 논문에 대한 자세한 리뷰입니다.#Review#Agentic AI#Deep Research#Iterative Reasoning#Long-Horizon Tasks#Context Management#Data Synthesis#Tool-Augmented LLMs#Markov Decision Process2025년 9월 17일댓글 수 로딩 중
[논문리뷰] A Stitch in Time Saves Nine: Proactive Self-Refinement for Language ModelsZishang Jiang이 arXiv에 게시한 'A Stitch in Time Saves Nine: Proactive Self-Refinement for Language Models' 논문에 대한 자세한 리뷰입니다.#Review#Self-Refinement#Language Models#Reinforcement Learning#Proactive AI#Generation Process#Markov Decision Process#Adaptive Learning#LLM Efficiency2025년 8월 20일댓글 수 로딩 중
[논문리뷰] Agent Lightning: Train ANY AI Agents with Reinforcement LearningZilong Wang이 arXiv에 게시한 'Agent Lightning: Train ANY AI Agents with Reinforcement Learning' 논문에 대한 자세한 리뷰입니다.#Review#Reinforcement Learning#Large Language Models#AI Agents#Framework#Markov Decision Process#Hierarchical RL#Training-Agent Disaggregation#Observability2025년 8월 7일댓글 수 로딩 중