[논문리뷰] Wiki Live Challenge: Challenging Deep Research Agents with Expert-Level Wikipedia ArticlesarXiv에 게시된 'Wiki Live Challenge: Challenging Deep Research Agents with Expert-Level Wikipedia Articles' 논문에 대한 자세한 리뷰입니다.#Review#Deep Research Agents#LLM Evaluation#Wikipedia#Good Articles#Factuality#Writing Quality#Benchmark#Hallucinations#Verifiability2026년 2월 2일댓글 수 로딩 중
[논문리뷰] DeepScholar-Bench: A Live Benchmark and Automated Evaluation for Generative Research SynthesisIon Stoica이 arXiv에 게시한 'DeepScholar-Bench: A Live Benchmark and Automated Evaluation for Generative Research Synthesis' 논문에 대한 자세한 리뷰입니다.#Review#Generative Research Synthesis#Live Benchmark#Automated Evaluation#LLM-as-a-judge#Related Work Generation#Retrieval-Augmented Generation#Verifiability2025년 8월 28일댓글 수 로딩 중