[논문리뷰] Reward Hacking in the Era of Large Models: Mechanisms, Emergent Misalignment, ChallengesarXiv에 게시된 'Reward Hacking in the Era of Large Models: Mechanisms, Emergent Misalignment, Challenges' 논문에 대한 자세한 리뷰입니다.#Review#Reward Hacking#Alignment#RLHF#Proxy Compression Hypothesis#Emergent Misalignment#Large Models#Scalable Oversight2026년 4월 22일댓글 수 로딩 중