[논문리뷰] Less Detail, Better Answers: Degradation-Driven Prompting for VQAarXiv에 게시된 'Less Detail, Better Answers: Degradation-Driven Prompting for VQA' 논문에 대한 자세한 리뷰입니다.#Review#Vision-Language Models#Visual Question Answering#Degradation-Driven Prompting#Agentic Perception#Structural Bottleneck2026년 4월 6일댓글 수 로딩 중
[논문리뷰] Omni-Diffusion: Unified Multimodal Understanding and Generation with Masked Discrete DiffusionarXiv에 게시된 'Omni-Diffusion: Unified Multimodal Understanding and Generation with Masked Discrete Diffusion' 논문에 대한 자세한 리뷰입니다.#Review#Multimodal AI#Discrete Diffusion Models#Masked Language Modeling#Unified Generative Models#Any-to-Any#Speech-to-Image#Visual Question Answering2026년 3월 10일댓글 수 로딩 중
[논문리뷰] When Does RL Help Medical VLMs? Disentangling Vision, SFT, and RL GainsarXiv에 게시된 'When Does RL Help Medical VLMs? Disentangling Vision, SFT, and RL Gains' 논문에 대한 자세한 리뷰입니다.#Review#Medical VLMs#Reinforcement Learning#Supervised Fine-tuning#Visual Question Answering#Multi-modality#Reasoning Capacity#MedMNIST2026년 3월 2일댓글 수 로딩 중
[논문리뷰] LaViDa-R1: Advancing Reasoning for Unified Multimodal Diffusion Language ModelsarXiv에 게시된 'LaViDa-R1: Advancing Reasoning for Unified Multimodal Diffusion Language Models' 논문에 대한 자세한 리뷰입니다.#Review#Multimodal Diffusion Models#Reasoning#Reinforcement Learning#Supervised Finetuning#Visual Question Answering#Image Editing#Object Grounding#Policy Gradient2026년 2월 16일댓글 수 로딩 중
[논문리뷰] Vision-DeepResearch: Incentivizing DeepResearch Capability in Multimodal Large Language ModelsZhen Fang이 arXiv에 게시한 'Vision-DeepResearch: Incentivizing DeepResearch Capability in Multimodal Large Language Models' 논문에 대한 자세한 리뷰입니다.#Review#Multimodal Large Language Models#Deep Research#Agentic AI#Tool Use#Visual Question Answering#Reinforcement Learning#Multi-scale Search2026년 2월 2일댓글 수 로딩 중
[논문리뷰] Vision-DeepResearch Benchmark: Rethinking Visual and Textual Search for Multimodal Large Language ModelsShuang Chen이 arXiv에 게시한 'Vision-DeepResearch Benchmark: Rethinking Visual and Textual Search for Multimodal Large Language Models' 논문에 대한 자세한 리뷰입니다.#Review#Multimodal Large Language Models#Visual Question Answering#Deep Research#Benchmark#Visual Search#Textual Search#Cropped Search#Evaluation2026년 2월 2일댓글 수 로딩 중
[논문리뷰] Toward Cognitive Supersensing in Multimodal Large Language ModelYifan Xu이 arXiv에 게시한 'Toward Cognitive Supersensing in Multimodal Large Language Model' 논문에 대한 자세한 리뷰입니다.#Review#Multimodal Large Language Models#Cognitive Reasoning#Visual Imagery#Latent Representations#Reinforcement Learning#Visual Question Answering#Benchmark2026년 2월 2일댓글 수 로딩 중
[논문리뷰] MMFineReason: Closing the Multimodal Reasoning Gap via Open Data-Centric MethodsarXiv에 게시된 'MMFineReason: Closing the Multimodal Reasoning Gap via Open Data-Centric Methods' 논문에 대한 자세한 리뷰입니다.#Review#Multimodal Reasoning#Data-centric AI#Chain-of-Thought#Large Language Models#Visual Question Answering#STEM Reasoning#Dataset#Fine-tuning2026년 1월 29일댓글 수 로딩 중
[논문리뷰] UniPercept: Towards Unified Perceptual-Level Image Understanding across Aesthetics, Quality, Structure, and TextureKaiwen Zhu이 arXiv에 게시한 'UniPercept: Towards Unified Perceptual-Level Image Understanding across Aesthetics, Quality, Structure, and Texture' 논문에 대한 자세한 리뷰입니다.#Review#Perceptual Understanding#Image Aesthetics#Image Quality#Image Structure#Image Texture#MLLM Benchmark#Visual Question Answering#Reward Model2025년 12월 28일댓글 수 로딩 중
[논문리뷰] Jina-VLM: Small Multilingual Vision Language ModelarXiv에 게시된 'Jina-VLM: Small Multilingual Vision Language Model' 논문에 대한 자세한 리뷰입니다.#Review#Vision-Language Model#Multilingual VLM#Small VLM#Visual Question Answering#Attention Pooling#Image Tiling#SigLIP#Qwen2025년 12월 3일댓글 수 로딩 중
[논문리뷰] Scaling Spatial Intelligence with Multimodal Foundation ModelsarXiv에 게시된 'Scaling Spatial Intelligence with Multimodal Foundation Models' 논문에 대한 자세한 리뷰입니다.#Review#Spatial Intelligence#Multimodal Foundation Models#Data Scaling#Perspective-taking#Visual Question Answering#Emergent Capabilities#Embodied AI#Benchmark Evaluation2025년 11월 20일댓글 수 로딩 중
[논문리뷰] Knowledge-based Visual Question Answer with Multimodal Processing, Retrieval and FilteringarXiv에 게시된 'Knowledge-based Visual Question Answer with Multimodal Processing, Retrieval and Filtering' 논문에 대한 자세한 리뷰입니다.#Review#Visual Question Answering#Retrieval-Augmented Generation#Multimodal AI#Reinforcement Learning#Knowledge Base#Tool Learning#Information Filtering2025년 10월 21일댓글 수 로딩 중
[논문리뷰] DeepMMSearch-R1: Empowering Multimodal LLMs in Multimodal Web SearcharXiv에 게시된 'DeepMMSearch-R1: Empowering Multimodal LLMs in Multimodal Web Search' 논문에 대한 자세한 리뷰입니다.#Review#Multimodal LLM#Web Search#Visual Question Answering#Reinforcement Learning#Image Cropping#Self-Correction#Tool Use2025년 10월 15일댓글 수 로딩 중
[논문리뷰] VisualOverload: Probing Visual Understanding of VLMs in Really Dense ScenesMuhammad Huzaifa이 arXiv에 게시한 'VisualOverload: Probing Visual Understanding of VLMs in Really Dense Scenes' 논문에 대한 자세한 리뷰입니다.#Review#Visual Question Answering#Multimodal Models#Dense Scenes#Fine-Grained Perception#Benchmark#Error Analysis#Counting#OCR2025년 10월 1일댓글 수 로딩 중
[논문리뷰] VaseVQA: Multimodal Agent and Benchmark for Ancient Greek PotteryShiya Huang이 arXiv에 게시한 'VaseVQA: Multimodal Agent and Benchmark for Ancient Greek Pottery' 논문에 대한 자세한 리뷰입니다.#Review#Multimodal Large Language Models#Visual Question Answering#Reinforcement Learning#Cultural Heritage#Ancient Greek Pottery#Supervised Fine-Tuning#Benchmark2025년 9월 23일댓글 수 로딩 중
[논문리뷰] MANZANO: A Simple and Scalable Unified Multimodal Model with a Hybrid Vision Tokenizerjialingt이 arXiv에 게시한 'MANZANO: A Simple and Scalable Unified Multimodal Model with a Hybrid Vision Tokenizer' 논문에 대한 자세한 리뷰입니다.#Review#Multimodal LLM#Hybrid Tokenizer#Text-to-Image Generation#Visual Question Answering#Autoregressive Model#Diffusion Decoder#Unified Architecture#Model Scaling2025년 9월 22일댓글 수 로딩 중
[논문리뷰] MARS2 2025 Challenge on Multimodal Reasoning: Datasets, Methods, Results, Discussion, and OutlookBowen Zhou이 arXiv에 게시한 'MARS2 2025 Challenge on Multimodal Reasoning: Datasets, Methods, Results, Discussion, and Outlook' 논문에 대한 자세한 리뷰입니다.#Review#Multimodal Reasoning#Large Language Models (LLMs)#Multimodal Large Language Models (MLLMs)#Visual Grounding#Visual Question Answering#Advertisement Video Analysis#Real-world Scenarios#Challenge Benchmark2025년 9월 18일댓글 수 로딩 중
[논문리뷰] GenExam: A Multidisciplinary Text-to-Image ExamYu Qiao이 arXiv에 게시한 'GenExam: A Multidisciplinary Text-to-Image Exam' 논문에 대한 자세한 리뷰입니다.#Review#Text-to-Image Generation#Multidisciplinary#Benchmark#Evaluation#AGI#Reasoning#Scoring System#Visual Question Answering2025년 9월 18일댓글 수 로딩 중
[논문리뷰] Multimodal Reasoning for Science: Technical Report and 1st Place Solution to the ICML 2025 SeePhys ChallengeWentao Zhang이 arXiv에 게시한 'Multimodal Reasoning for Science: Technical Report and 1st Place Solution to the ICML 2025 SeePhys Challenge' 논문에 대한 자세한 리뷰입니다.#Review#Multimodal Reasoning#Science AI#Caption-assisted Reasoning#SeePhys Challenge#Large Language Models#Visual Question Answering#Physics Problems#Cross-modal Alignment2025년 9월 17일댓글 수 로딩 중
[논문리뷰] Measuring Epistemic Humility in Multimodal Large Language ModelsKaiyang Zhou이 arXiv에 게시한 'Measuring Epistemic Humility in Multimodal Large Language Models' 논문에 대한 자세한 리뷰입니다.#Review#Multimodal Large Language Models#Hallucination#Epistemic Humility#Benchmark#False-Option Rejection#Visual Question Answering#Scene Graph2025년 9월 16일댓글 수 로딩 중
[논문리뷰] WildScore: Benchmarking MLLMs in-the-Wild Symbolic Music ReasoningAmit Namburi이 arXiv에 게시한 'WildScore: Benchmarking MLLMs in-the-Wild Symbolic Music Reasoning' 논문에 대한 자세한 리뷰입니다.#Review#Multimodal Large Language Models#Symbolic Music Reasoning#Music Score Analysis#Benchmarking#Visual Question Answering#In-the-Wild Data#Music Theory2025년 9월 8일댓글 수 로딩 중
[논문리뷰] 'Does the cafe entrance look accessible? Where is the door?' Towards Geospatial AI Agents for Visual InquiriesXia Su이 arXiv에 게시한 'Does the cafe entrance look accessible? Where is the door? Towards Geospatial AI Agents for Visual Inquiries' 논문에 대한 자세한 리뷰입니다.#Review#Geospatial AI#Multimodal AI Agents#Visual Question Answering#Accessibility#Street View Imagery#Spatial Reasoning#Human-Computer Interaction2025년 8월 22일댓글 수 로딩 중
[논문리뷰] Visual Document Understanding and Question Answering: A Multi-Agent Collaboration Framework with Test-Time ScalingRuolin Shen이 arXiv에 게시한 'Visual Document Understanding and Question Answering: A Multi-Agent Collaboration Framework with Test-Time Scaling' 논문에 대한 자세한 리뷰입니다.#Review#Visual Document Understanding#Visual Question Answering#Multi-Agent System#Test-Time Scaling#Self-Correction#Mixed Reward Modeling#Large Language Models2025년 8월 8일댓글 수 로딩 중