#Production-Evaluation Gap

1개의 포스트

[논문리뷰] An Enigma of Artificial Reason: Investigating the Production-Evaluation Gap in Large Reasoning Models

본 논문은 Large Reasoning Models가 추론 결과 생성에는 탁월한 성능을 보임에도 불구하고, 논리적 오류를 평가하는 능력에서는 심각한 결함을 보이는 Production-Evaluation Gap 문제를 제기한다.

#Review #Large Reasoning Models #Production-Evaluation Gap #Answer Confirmation Bias #Reasoning Evaluation #Chain-of-Thought #Causal Patching

2026년 6월 14일