[논문리뷰] MiroEval: Benchmarking Multimodal Deep Research Agents in Process and OutcomearXiv에 게시된 'MiroEval: Benchmarking Multimodal Deep Research Agents in Process and Outcome' 논문에 대한 자세한 리뷰입니다.#Review#Deep Research#Multimodal Benchmark#Process-Centric Evaluation#Factuality Verification#Agentic Systems#Adaptive Synthesis2026년 4월 1일댓글 수 로딩 중
[논문리뷰] How Well Do Models Follow Visual Instructions? VIBE: A Systematic Benchmark for Visual Instruction-Driven Image EditingHaochen Tian이 arXiv에 게시한 'How Well Do Models Follow Visual Instructions? VIBE: A Systematic Benchmark for Visual Instruction-Driven Image Editing' 논문에 대한 자세한 리뷰입니다.#Review#Visual Instruction#Image Editing#Multimodal Benchmark#LMM-as-a-judge#Deictic Grounding#Morphological Manipulation#Causal Reasoning#Generative Models2026년 2월 2일댓글 수 로딩 중
[논문리뷰] What Users Leave Unsaid: Under-Specified Queries Limit Vision-Language ModelsarXiv에 게시된 'What Users Leave Unsaid: Under-Specified Queries Limit Vision-Language Models' 논문에 대한 자세한 리뷰입니다.#Review#Vision-Language Models#Under-specified Queries#Multimodal Benchmark#HAERAE-Vision#Query Explicitation#Retrieval Augmentation#Cultural Knowledge#Korean QA2026년 1월 12일댓글 수 로딩 중
[논문리뷰] left|,circlearrowright,text{BUS},right|: A Large and Diverse Multimodal Benchmark for evaluating the ability of Vision-Language Models to understand Rebus PuzzlesDeepiha S이 arXiv에 게시한 'left|,circlearrowright,text{BUS},right|: A Large and Diverse Multimodal Benchmark for evaluating the ability of Vision-Language Models to understand Rebus Puzzles' 논문에 대한 자세한 리뷰입니다.#Review#Vision-Language Models#Multimodal Benchmark#Rebus Puzzles#In-Context Learning#Reasoning#ControlNet#Prompt Engineering2025년 11월 9일댓글 수 로딩 중