#Test-Time RL

1개의 포스트

[논문리뷰] Evolving Language Models without Labels: Majority Drives Selection, Novelty Promotes Variation

논문은 LLM이 라벨이나 외부 평가 없이 스스로 개선하려는 라벨-프리(label-free) 학습 환경에서 겪는 엔트로피 붕괴(entropy collapse) 문제를 해결하는 것을 목표로 합니다.

#Review #Label-free Reinforcement Learning #LLMs #Self-improvement #Entropy Collapse #Novelty Reward #Test-Time RL #GRPO #Evolutionary Computing Principles

2025년 9월 19일