본문으로 건너뛰기

#Visual Reasoning

57개의 포스트

[논문리뷰] Chartographer: Counterfactual Chart Generation for Evaluating Vision-Language Models

댓글 수 로딩 중

[논문리뷰] Edit-Compass & EditReward-Compass: A Unified Benchmark for Image Editing and Reward Modeling

댓글 수 로딩 중

[논문리뷰] Visual Reasoning through Tool-supervised Reinforcement Learning

댓글 수 로딩 중

[논문리뷰] Vero: An Open RL Recipe for General Visual Reasoning

댓글 수 로딩 중

[논문리뷰] ViGoR-Bench: How Far Are Visual Generative Models From Zero-Shot Visual Reasoners?

댓글 수 로딩 중

[논문리뷰] Insight-V++: Towards Advanced Long-Chain Visual Reasoning with Multimodal Large Language Models

댓글 수 로딩 중

[논문리뷰] XSkill: Continual Learning from Experience and Skills in Multimodal Agents

댓글 수 로딩 중

[논문리뷰] CoCo: Code as CoT for Text-to-Image Preview and Rare Concept Generation

댓글 수 로딩 중

[논문리뷰] AgentVista: Evaluating Multimodal Agents in Ultra-Challenging Realistic Visual Scenarios

댓글 수 로딩 중

[논문리뷰] Ref-Adv: Exploring MLLM Visual Reasoning in Referring Expression Tasks

댓글 수 로딩 중

[논문리뷰] Imagination Helps Visual Reasoning, But Not Yet in Latent Space

댓글 수 로딩 중

[논문리뷰] UniT: Unified Multimodal Chain-of-Thought Test-time Scaling

댓글 수 로딩 중

[논문리뷰] MetaphorStar: Image Metaphor Understanding and Reasoning with End-to-End Visual Reinforcement Learning

댓글 수 로딩 중

[논문리뷰] Thinking in Frames: How Visual Context and Test-Time Scaling Empower Video Reasoning

댓글 수 로딩 중

[논문리뷰] AdaReasoner: Dynamic Tool Orchestration for Iterative Visual Reasoning

댓글 수 로딩 중

[논문리뷰] CoF-T2I: Video Models as Pure Visual Reasoners for Text-to-Image Generation

댓글 수 로딩 중

[논문리뷰] CogFlow: Bridging Perception and Reasoning through Knowledge Internalization for Visual Mathematical Problem Solving

댓글 수 로딩 중

[논문리뷰] Sparse-LaViDa: Sparse Multimodal Discrete Diffusion Language Models

댓글 수 로딩 중

[논문리뷰] V-REX: Benchmarking Exploratory Visual Reasoning via Chain-of-Questions

댓글 수 로딩 중

[논문리뷰] ARM-Thinker: Reinforcing Multimodal Generative Reward Models with Agentic Tool Use and Visual Reasoning

댓글 수 로딩 중

[논문리뷰] OneThinker: All-in-one Reasoning Model for Image and Video

댓글 수 로딩 중

[논문리뷰] Revisiting the Necessity of Lengthy Chain-of-Thought in Vision-centric Reasoning Generalization

댓글 수 로딩 중

[논문리뷰] Artemis: Structured Visual Reasoning for Perception Policy Learning

댓글 수 로딩 중

[논문리뷰] SO-Bench: A Structural Output Evaluation of Multimodal LLMs

댓글 수 로딩 중

[논문리뷰] MVI-Bench: A Comprehensive Benchmark for Evaluating Robustness to Misleading Visual Inputs in LVLMs

댓글 수 로딩 중

[논문리뷰] MME-CC: A Challenging Multi-Modal Evaluation Benchmark of Cognitive Capacity

댓글 수 로딩 중

[논문리뷰] ChartM^3: A Multi-Stage Code-Driven Pipeline for Constructing Multi-Dimensional and Multi-Step Visual Reasoning Data in Chart Comprehension

댓글 수 로딩 중

[논문리뷰] Look Again, Think Slowly: Enhancing Visual Reflection in Vision-Language Models

댓글 수 로딩 중

[논문리뷰] Reinforced Visual Perception with Tools

댓글 수 로딩 중

[논문리뷰] Are Video Models Ready as Zero-Shot Reasoners? An Empirical Study with the MME-CoF Benchmark

댓글 수 로딩 중

[논문리뷰] PairUni: Pairwise Training for Unified Multimodal Language Models

댓글 수 로딩 중

[논문리뷰] Latent Sketchpad: Sketching Visual Thoughts to Elicit Multimodal Reasoning in MLLMs

댓글 수 로딩 중

[논문리뷰] VLM-FO1: Bridging the Gap Between High-Level Reasoning and Fine-Grained Perception in VLMs

댓글 수 로딩 중