[논문리뷰] Blind to the Human Touch: Overlap Bias in LLM-Based Summary EvaluationPuneet Mathur이 arXiv에 게시한 'Blind to the Human Touch: Overlap Bias in LLM-Based Summary Evaluation' 논문에 대한 자세한 리뷰입니다.#Review#LLM-as-a-judge#Summarization Evaluation#Overlap Bias#Position Bias#N-gram Metrics#Gemma#Llama#Evaluation Bias2026년 2월 16일댓글 수 로딩 중
[논문리뷰] Are Large Reasoning Models Good Translation Evaluators? Analysis and Performance BoostMin Yang이 arXiv에 게시한 'Are Large Reasoning Models Good Translation Evaluators? Analysis and Performance Boost' 논문에 대한 자세한 리뷰입니다.#Review#Machine Translation Evaluation#Large Reasoning Models#LLM-as-a-judge#MQM#Fine-tuning#Thinking Calibration#Computational Efficiency#Meta-evaluation2025년 10월 27일댓글 수 로딩 중
[논문리뷰] Who's Your Judge? On the Detectability of LLM-Generated JudgmentsarXiv에 게시된 'Who's Your Judge? On the Detectability of LLM-Generated Judgments' 논문에 대한 자세한 리뷰입니다.#Review#LLM-as-a-judge#Judgment Detection#Bias Quantification#Feature Engineering#Interpretability#Peer Review#AI Ethics#Evaluation2025년 10월 1일댓글 수 로딩 중
[논문리뷰] DeepScholar-Bench: A Live Benchmark and Automated Evaluation for Generative Research SynthesisIon Stoica이 arXiv에 게시한 'DeepScholar-Bench: A Live Benchmark and Automated Evaluation for Generative Research Synthesis' 논문에 대한 자세한 리뷰입니다.#Review#Generative Research Synthesis#Live Benchmark#Automated Evaluation#LLM-as-a-judge#Related Work Generation#Retrieval-Augmented Generation#Verifiability2025년 8월 28일댓글 수 로딩 중