#Contamination-Free

1개의 포스트

[논문리뷰] FlagEval Findings Report: A Preliminary Evaluation of Large Reasoning Models on Automatically Verifiable Textual and Visual Questions

본 논문은 최신 대규모 추론 모델(LRMs) 을 자동으로 검증 가능한 텍스트 및 시각 질문 에 대해 오염 없는(contamination-free) 방식으로 평가하는 예비 보고서입니다.

#Review #Large Reasoning Models #LLM Evaluation #Multimodal AI #Reasoning Behaviors #Hallucination #Contamination-Free #AI Safety #Instruction Following

2025년 9월 23일