본문으로 건너뛰기

#Reward Shaping

24개의 포스트

[논문리뷰] A Subgoal-driven Framework for Improving Long-Horizon LLM Agents

댓글 수 로딩 중

[논문리뷰] Tool-R0: Self-Evolving LLM Agents for Tool-Learning from Zero Data

댓글 수 로딩 중

[논문리뷰] Think Longer to Explore Deeper: Learn to Explore In-Context via Length-Incentivized Reinforcement Learning

댓글 수 로딩 중

[논문리뷰] Continual GUI Agents

댓글 수 로딩 중

[논문리뷰] Diversity or Precision? A Deep Dive into Next Token Prediction

댓글 수 로딩 중

[논문리뷰] SRPO: Self-Referential Policy Optimization for Vision-Language-Action Models

댓글 수 로딩 중

[논문리뷰] Agent-R1: Training Powerful LLM Agents with End-to-End Reinforcement Learning

댓글 수 로딩 중

[논문리뷰] MarsRL: Advancing Multi-Agent Reasoning System via Reinforcement Learning with Agentic Pipeline Parallelism

댓글 수 로딩 중

[논문리뷰] Rubric-Based Benchmarking and Reinforcement Learning for Advancing LLM Instruction Following

댓글 수 로딩 중

[논문리뷰] Rank-GRPO: Training LLM-based Conversational Recommender Systems with Reinforcement Learning

댓글 수 로딩 중

[논문리뷰] TempSamp-R1: Effective Temporal Sampling with Reinforcement Fine-Tuning for Video LLMs

댓글 수 로딩 중

[논문리뷰] A Practitioner's Guide to Multi-turn Agentic Reinforcement Learning

댓글 수 로딩 중

[논문리뷰] Every Question Has Its Own Value: Reinforcement Learning with Explicit Human Values

댓글 수 로딩 중