#Adaptive Training

2개의 포스트

[논문리뷰] Learning on the Job: Test-Time Curricula for Targeted Reinforcement Learning

본 연구는 대규모 언어 모델(LLM)이 테스트 시점에 표적 작업을 해결하는 추론 능력을 지속적으로 향상 시키는 방법을 제안합니다.

#Review #Test-Time Curriculum #Reinforcement Learning #Large Language Models #Self-Curated Learning #Continual Learning #Reasoning Benchmarks #Adaptive Training

2025년 10월 7일

[논문리뷰] DeepSearch: Overcome the Bottleneck of Reinforcement Learning with Verifiable Rewards via Monte Carlo Tree Search

논문은 LLM의 추론 능력 향상을 위한 Verifiable Rewards 기반의 강화 학습(RLVR) 에서 발생하는 훈련 정체(training plateaus) 및 불충분한 탐색(insufficient exploration) 문제를 해결하는 것을 목표로 합니다.

#Review #Reinforcement Learning with Verifiable Rewards (RLVR)#Monte Carlo Tree Search (MCTS)#Mathematical Reasoning #Large Language Models (LLMs)#Systematic Exploration #Adaptive Training #Tree-GRPO

2025년 10월 2일