[논문리뷰] OpenREAD: Reinforced Open-Ended Reasoing for End-to-End Autonomous Driving with LLM-as-CriticarXiv에 게시된 'OpenREAD: Reinforced Open-Ended Reasoing for End-to-End Autonomous Driving with LLM-as-Critic' 논문에 대한 자세한 리뷰입니다.#Review#Autonomous Driving#Reinforcement Fine-tuning#LLM-as-Critic#Vision-Language Model#End-to-End Learning#Chain-of-Thought#Trajectory Planning2025년 12월 1일댓글 수 로딩 중
[논문리뷰] Chasing the Tail: Effective Rubric-based Reward Modeling for Large Language Model Post-TrainingarXiv에 게시된 'Chasing the Tail: Effective Rubric-based Reward Modeling for Large Language Model Post-Training' 논문에 대한 자세한 리뷰입니다.#Review#LLM#Reinforcement Fine-tuning#Reward Modeling#Reward Over-optimization#Rubric-based Rewards#High-reward Tail#Off-policy Data#LLM Alignment2025년 9월 29일댓글 수 로딩 중