본문으로 건너뛰기

#Exploration-Exploitation

17개의 포스트

[논문리뷰] Think Longer to Explore Deeper: Learn to Explore In-Context via Length-Incentivized Reinforcement Learning

댓글 수 로딩 중

[논문리뷰] On the Entropy Dynamics in Reinforcement Fine-Tuning of Large Language Models

댓글 수 로딩 중

[논문리뷰] Back to Basics: Revisiting Exploration in Reinforcement Learning for LLM Reasoning via Generative Probabilities

댓글 수 로딩 중

[논문리뷰] CoBA-RL: Capability-Oriented Budget Allocation for Reinforcement Learning in LLMs

댓글 수 로딩 중

[논문리뷰] Diversity or Precision? A Deep Dive into Next Token Prediction

댓글 수 로딩 중

[논문리뷰] Exploration v.s. Exploitation: Rethinking RLVR through Clipping, Entropy, and Spurious Reward

댓글 수 로딩 중

[논문리뷰] Quantile Advantage Estimation for Entropy-Safe Reasoning

댓글 수 로딩 중

[논문리뷰] Learn the Ropes, Then Trust the Wins: Self-imitation with Progressive Exploration for Agentic Reinforcement Learning

댓글 수 로딩 중

[논문리뷰] AgentGym-RL: Training LLM Agents for Long-Horizon Decision Making through Multi-Turn Reinforcement Learning

댓글 수 로딩 중

[논문리뷰] Pass@k Training for Adaptively Balancing Exploration and Exploitation of Large Reasoning Models

댓글 수 로딩 중

[논문리뷰] Exploitation Is All You Need... for Exploration

댓글 수 로딩 중

[논문리뷰] Multimodal Prompt Optimization: Why Not Leverage Multiple Modalities for MLLMs

댓글 수 로딩 중

[논문리뷰] NewtonBench: Benchmarking Generalizable Scientific Law Discovery in LLM Agents

댓글 수 로딩 중