[논문리뷰] Blind to the Human Touch: Overlap Bias in LLM-Based Summary EvaluationPuneet Mathur이 arXiv에 게시한 'Blind to the Human Touch: Overlap Bias in LLM-Based Summary Evaluation' 논문에 대한 자세한 리뷰입니다.#Review#LLM-as-a-judge#Summarization Evaluation#Overlap Bias#Position Bias#N-gram Metrics#Gemma#Llama#Evaluation Bias2026년 2월 16일댓글 수 로딩 중
[논문리뷰] Neither Valid nor Reliable? Investigating the Use of LLMs as JudgesGolnoosh Farnadi이 arXiv에 게시한 'Neither Valid nor Reliable? Investigating the Use of LLMs as Judges' 논문에 대한 자세한 리뷰입니다.#Review#LLMs as Judges#NLG Evaluation#Measurement Theory#Validity#Reliability#Evaluation Bias#Scalability#Responsible AI2025년 8월 26일댓글 수 로딩 중