[논문리뷰] OpenWorldLib: A Unified Codebase and Definition of Advanced World ModelsarXiv에 게시된 'OpenWorldLib: A Unified Codebase and Definition of Advanced World Models' 논문에 대한 자세한 리뷰입니다.#Review#World Models#Unified Inference Framework#Multimodal Reasoning#Vision-Language-Action#3D Generation#Interactive Video Generation2026년 4월 6일댓글 수 로딩 중
[논문리뷰] PerceptionComp: A Video Benchmark for Complex Perception-Centric ReasoningZhixuan Zhao이 arXiv에 게시한 'PerceptionComp: A Video Benchmark for Complex Perception-Centric Reasoning' 논문에 대한 자세한 리뷰입니다.#Review#Video Benchmark#Multimodal Reasoning#Perception-Centric#Long-Horizon#Test-Time Thinking2026년 4월 1일댓글 수 로딩 중
[논문리뷰] When Models Judge Themselves: Unsupervised Self-Evolution for Multimodal ReasoningarXiv에 게시된 'When Models Judge Themselves: Unsupervised Self-Evolution for Multimodal Reasoning' 논문에 대한 자세한 리뷰입니다.#Review#Unsupervised Self-Evolution#Multimodal Reasoning#Consistency-Based Reward#Judge Modulation#Group Relative Policy Optimization (GRPO)#Policy Updates#Mathematical Reasoning#Large Language Models2026년 3월 25일댓글 수 로딩 중
[논문리뷰] From Narrow to Panoramic Vision: Attention-Guided Cold-Start Reshapes Multimodal ReasoningarXiv에 게시된 'From Narrow to Panoramic Vision: Attention-Guided Cold-Start Reshapes Multimodal Reasoning' 논문에 대한 자세한 리뷰입니다.#Review#Multimodal Reasoning#Cold-Start Initialization#Attention Mechanism#Visual Grounding#Large Multimodal Models (LMMs)#Reinforcement Learning (RLHF)#Data Synthesis#Visual Attention Score (VAS)2026년 3월 9일댓글 수 로딩 중
[논문리뷰] MMR-Life: Piecing Together Real-life Scenes for Multimodal Multi-image ReasoningarXiv에 게시된 'MMR-Life: Piecing Together Real-life Scenes for Multimodal Multi-image Reasoning' 논문에 대한 자세한 리뷰입니다.#Review#Multimodal Reasoning#Multi-Image Analysis#Real-life Scenarios#Benchmark#MLLMs Evaluation#Chain-of-Thought#Reasoning Types2026년 3월 2일댓글 수 로딩 중
[논문리뷰] From Blind Spots to Gains: Diagnostic-Driven Iterative Training for Large Multimodal ModelsWei Ye이 arXiv에 게시한 'From Blind Spots to Gains: Diagnostic-Driven Iterative Training for Large Multimodal Models' 논문에 대한 자세한 리뷰입니다.#Review#Large Multimodal Models#Iterative Training#Diagnostic-Driven Learning#Reinforcement Learning#Multimodal Reasoning#Data Generation#Agent Systems2026년 2월 26일댓글 수 로딩 중
[논문리뷰] DeepVision-103K: A Visually Diverse, Broad-Coverage, and Verifiable Mathematical Dataset for Multimodal ReasoningWei Wang이 arXiv에 게시한 'DeepVision-103K: A Visually Diverse, Broad-Coverage, and Verifiable Mathematical Dataset for Multimodal Reasoning' 논문에 대한 자세한 리뷰입니다.#Review#Multimodal Reasoning#Mathematical Dataset#RLVR#Data Curation#Visual Diversity#K12 Mathematics#Large Multimodal Models2026년 2월 22일댓글 수 로딩 중
[논문리뷰] BrowseComp-V^3: A Visual, Vertical, and Verifiable Benchmark for Multimodal Browsing AgentsYanzhe Dan이 arXiv에 게시한 'BrowseComp-V^3: A Visual, Vertical, and Verifiable Benchmark for Multimodal Browsing Agents' 논문에 대한 자세한 리뷰입니다.#Review#Multimodal LLMs#Web Browsing Agents#Deep Search#Benchmark#Tool Use#Process Evaluation#Multimodal Reasoning#Open-world QA2026년 2월 16일댓글 수 로딩 중
[논문리뷰] Thinking with Drafting: Optical Decompression via Logical ReconstructionarXiv에 게시된 'Thinking with Drafting: Optical Decompression via Logical Reconstruction' 논문에 대한 자세한 리뷰입니다.#Review#Multimodal Reasoning#Visual Algebra#Domain-Specific Language#Optical Decompression#Logical Reconstruction#Bar Model#MLLMs#Verification2026년 2월 12일댓글 수 로딩 중
[논문리뷰] When the Prompt Becomes Visual: Vision-Centric Jailbreak Attacks for Large Image Editing ModelsarXiv에 게시된 'When the Prompt Becomes Visual: Vision-Centric Jailbreak Attacks for Large Image Editing Models' 논문에 대한 자세한 리뷰입니다.#Review#Vision-Centric Jailbreak Attack#Image Editing Models#Safety Benchmark#IESBench#Multimodal Reasoning#Adversarial Attack#Defense Mechanism2026년 2월 11일댓글 수 로딩 중
[논문리뷰] Chain of Mindset: Reasoning with Adaptive Cognitive ModesarXiv에 게시된 'Chain of Mindset: Reasoning with Adaptive Cognitive Modes' 논문에 대한 자세한 리뷰입니다.#Review#Adaptive Reasoning#Cognitive Modes#Large Language Models (LLMs)#Agentic AI#Multimodal Reasoning#Mindset Orchestration#Contextual Filtering#Training-free Framework2026년 2월 10일댓글 수 로딩 중
[논문리뷰] Length-Unbiased Sequence Policy Optimization: Revealing and Controlling Response Length Variation in RLVRZhixiong Zeng이 arXiv에 게시한 'Length-Unbiased Sequence Policy Optimization: Revealing and Controlling Response Length Variation in RLVR' 논문에 대한 자세한 리뷰입니다.#Review#Reinforcement Learning with Verifiable Rewards#LLMs#Policy Optimization#Response Length Bias#Sequence-level Clipping#Length-Unbiased Optimization#Multimodal Reasoning2026년 2월 5일댓글 수 로딩 중
[논문리뷰] AdaptMMBench: Benchmarking Adaptive Multimodal Reasoning for Mode Selection and Reasoning ProcessShilin Yan이 arXiv에 게시한 'AdaptMMBench: Benchmarking Adaptive Multimodal Reasoning for Mode Selection and Reasoning Process' 논문에 대한 자세한 리뷰입니다.#Review#Multimodal Reasoning#Adaptive Learning#Vision-Language Models (VLMs)#Benchmarking#Mode Selection#Tool Learning#Reasoning Process Evaluation#Matthews Correlation Coefficient (MCC)2026년 2월 3일댓글 수 로딩 중
[논문리뷰] UniReason 1.0: A Unified Reasoning Framework for World Knowledge Aligned Image Generation and EditingSize Wu이 arXiv에 게시한 'UniReason 1.0: A Unified Reasoning Framework for World Knowledge Aligned Image Generation and Editing' 논문에 대한 자세한 리뷰입니다.#Review#Multimodal Reasoning#Image Generation#Image Editing#World Knowledge#Self-Reflection#Unified Framework#Text-to-Image2026년 2월 2일댓글 수 로딩 중
[논문리뷰] Mind-Brush: Integrating Agentic Cognitive Search and Reasoning into Image GenerationChenjue Zhang이 arXiv에 게시한 'Mind-Brush: Integrating Agentic Cognitive Search and Reasoning into Image Generation' 논문에 대한 자세한 리뷰입니다.#Review#Agentic Text-to-Image#Multimodal Reasoning#Cognitive Search#Knowledge-Driven Generation#Image Generation Benchmarks#Complex User Intent2026년 2월 2일댓글 수 로딩 중
[논문리뷰] MMFineReason: Closing the Multimodal Reasoning Gap via Open Data-Centric MethodsarXiv에 게시된 'MMFineReason: Closing the Multimodal Reasoning Gap via Open Data-Centric Methods' 논문에 대한 자세한 리뷰입니다.#Review#Multimodal Reasoning#Data-centric AI#Chain-of-Thought#Large Language Models#Visual Question Answering#STEM Reasoning#Dataset#Fine-tuning2026년 1월 29일댓글 수 로딩 중
[논문리뷰] Visual Generation Unlocks Human-Like Reasoning through Multimodal World ModelsarXiv에 게시된 'Visual Generation Unlocks Human-Like Reasoning through Multimodal World Models' 논문에 대한 자세한 리뷰입니다.#Review#Multimodal AI#World Models#Visual Generation#Chain-of-Thought (CoT)#Multimodal Reasoning#Unified Multimodal Models#Spatial-Physical Reasoning2026년 1월 27일댓글 수 로딩 중
[논문리뷰] Scientific Image Synthesis: Benchmarking, Methodologies, and Downstream UtilityarXiv에 게시된 'Scientific Image Synthesis: Benchmarking, Methodologies, and Downstream Utility' 논문에 대한 자세한 리뷰입니다.#Review#Scientific Image Synthesis#Multimodal Reasoning#Text-to-Image#Benchmarking#Programmatic Synthesis#Large Multimodal Models#Synthetic Data2026년 1월 26일댓글 수 로딩 중
[논문리뷰] Agentic Very Long Video UnderstandingarXiv에 게시된 'Agentic Very Long Video Understanding' 논문에 대한 자세한 리뷰입니다.#Review#Long-Horizon Video Understanding#Agentic AI#Entity Graph#Multimodal Reasoning#Video Question Answering#EgoLifeQA#Retrieval Augmented Generation2026년 1월 26일댓글 수 로딩 중
[논문리뷰] XR: Cross-Modal Agents for Composed Image RetrievalarXiv에 게시된 'XR: Cross-Modal Agents for Composed Image Retrieval' 논문에 대한 자세한 리뷰입니다.#Review#Composed Image Retrieval#Cross-Modal Agents#Multimodal Reasoning#Training-free Framework#Information Retrieval#Agentic AI#Progressive Retrieval2026년 1월 21일댓글 수 로딩 중
[논문리뷰] DiffThinker: Towards Generative Multimodal Reasoning with Diffusion ModelsSiyuan Huang이 arXiv에 게시한 'DiffThinker: Towards Generative Multimodal Reasoning with Diffusion Models' 논문에 대한 자세한 리뷰입니다.#Review#Multimodal Reasoning#Diffusion Models#Image-to-Image Generation#Vision-centric AI#Generative AI#Spatial Planning#Constraint Satisfaction2026년 1월 1일댓글 수 로딩 중
[논문리뷰] Figure It Out: Improving the Frontier of Reasoning with Active Visual ThinkingJie Zhou이 arXiv에 게시한 'Figure It Out: Improving the Frontier of Reasoning with Active Visual Thinking' 논문에 대한 자세한 리뷰입니다.#Review#Multimodal Reasoning#Visual Thinking#Reinforcement Learning#Code Generation#Geometric Reasoning#Adaptive Reward Mechanism#Problem Solving2025년 12월 31일댓글 수 로딩 중
[논문리뷰] See Less, See Right: Bi-directional Perceptual Shaping For Multimodal ReasoningarXiv에 게시된 'See Less, See Right: Bi-directional Perceptual Shaping For Multimodal Reasoning' 논문에 대한 자세한 리뷰입니다.#Review#Multimodal Reasoning#Vision-Language Models (VLMs)#Perceptual Shaping#KL-Divergence#Chart Understanding#Data Augmentation#Reinforcement Learning (RL)#GRPO2025년 12월 28일댓글 수 로딩 중
[논문리뷰] LongVideoAgent: Multi-Agent Reasoning with Long VideosRenjie Pi이 arXiv에 게시한 'LongVideoAgent: Multi-Agent Reasoning with Long Videos' 논문에 대한 자세한 리뷰입니다.#Review#Multi-Agent System#Long Video Understanding#Video Question Answering#Reinforcement Learning#Large Language Models#Temporal Grounding#Multimodal Reasoning#Tool-Augmented AI2025년 12월 23일댓글 수 로딩 중
[논문리뷰] Multimodal RewardBench 2: Evaluating Omni Reward Models for Interleaved Text and ImagearXiv에 게시된 'Multimodal RewardBench 2: Evaluating Omni Reward Models for Interleaved Text and Image' 논문에 대한 자세한 리뷰입니다.#Review#Reward Models#Multimodal LLMs#Benchmark#Text-to-Image Generation#Image Editing#Interleaved Generation#Multimodal Reasoning#MLLM-as-a-judge2025년 12월 18일댓글 수 로딩 중
[논문리뷰] A4-Agent: An Agentic Framework for Zero-Shot Affordance ReasoningHongfei Zhang이 arXiv에 게시한 'A4-Agent: An Agentic Framework for Zero-Shot Affordance Reasoning' 논문에 대한 자세한 리뷰입니다.#Review#Affordance Prediction#Zero-Shot Learning#Agentic AI#Foundation Models#Multimodal Reasoning#Visual Grounding#Image Generation#Robotics2025년 12월 16일댓글 수 로딩 중
[논문리뷰] Decouple to Generalize: Context-First Self-Evolving Learning for Data-Scarce Vision-Language ReasoningarXiv에 게시된 'Decouple to Generalize: Context-First Self-Evolving Learning for Data-Scarce Vision-Language Reasoning' 논문에 대한 자세한 리뷰입니다.#Review#Vision-Language Models#Reinforcement Learning#Self-Evolving Learning#Data-Scarce Domains#Context-First Learning#Reward Hacking Mitigation#Multimodal Reasoning#Curriculum Learning2025년 12월 8일댓글 수 로딩 중
[논문리뷰] Qwen3-VL Technical ReportarXiv에 게시된 'Qwen3-VL Technical Report' 논문에 대한 자세한 리뷰입니다.#Review#Vision-Language Model#Multimodal Reasoning#Long-Context#Interleaved Data#Mixture-of-Experts#DeepStack#Agentic AI2025년 12월 3일댓글 수 로딩 중
[논문리뷰] Agentic Learner with Grow-and-Refine Multimodal Semantic MemoryQunyi Xie이 arXiv에 게시한 'Agentic Learner with Grow-and-Refine Multimodal Semantic Memory' 논문에 대한 자세한 리뷰입니다.#Review#Multimodal LLMs#Semantic Memory#Agentic Learning#Error Attribution#Visual Reasoning#Long-term Memory#Grow-and-Refine#Multimodal Reasoning2025년 11월 27일댓글 수 로딩 중
[논문리뷰] Chain-of-Visual-Thought: Teaching VLMs to See and Think Better with Continuous Visual TokensStephanie Fu이 arXiv에 게시한 'Chain-of-Visual-Thought: Teaching VLMs to See and Think Better with Continuous Visual Tokens' 논문에 대한 자세한 리뷰입니다.#Review#Vision-Language Models (VLMs)#Chain-of-Thought (CoT)#Continuous Visual Tokens#Multimodal Reasoning#Perceptual Grounding#Visual Thinking#Dense Prediction2025년 11월 24일댓글 수 로딩 중
[논문리뷰] OpenMMReasoner: Pushing the Frontiers for Multimodal Reasoning with an Open and General RecipearXiv에 게시된 'OpenMMReasoner: Pushing the Frontiers for Multimodal Reasoning with an Open and General Recipe' 논문에 대한 자세한 리뷰입니다.#Review#Multimodal Reasoning#Large Multimodal Models#Supervised Fine-tuning#Reinforcement Learning#Data Curation#Open-source#Multimodal Benchmarks2025년 11월 23일댓글 수 로딩 중
[논문리뷰] VisPlay: Self-Evolving Vision-Language Models from ImagesarXiv에 게시된 'VisPlay: Self-Evolving Vision-Language Models from Images' 논문에 대한 자세한 리뷰입니다.#Review#Self-Evolving#Vision-Language Models#Reinforcement Learning#Self-Play#Unlabeled Data#Multimodal Reasoning#Group Relative Policy Optimization#Hallucination Mitigation2025년 11월 19일댓글 수 로딩 중
[논문리뷰] Reasoning via Video: The First Evaluation of Video Models' Reasoning Abilities through Maze-Solving TasksYiran Peng이 arXiv에 게시한 'Reasoning via Video: The First Evaluation of Video Models' Reasoning Abilities through Maze-Solving Tasks' 논문에 대한 자세한 리뷰입니다.#Review#Video Models#Spatial Reasoning#Maze Solving#Video Generation#Benchmark#Supervised Fine-tuning#Test-Time Scaling#Multimodal Reasoning2025년 11월 19일댓글 수 로딩 중
[논문리뷰] REVISOR: Beyond Textual Reflection, Towards Multimodal Introspective Reasoning in Long-Form Video UnderstandingJingyang Chen이 arXiv에 게시한 'REVISOR: Beyond Textual Reflection, Towards Multimodal Introspective Reasoning in Long-Form Video Understanding' 논문에 대한 자세한 리뷰입니다.#Review#Multimodal Reasoning#Long-Form Video Understanding#Self-Reflection#Reinforcement Learning#Tool-Augmented MLLMs#Visual Rethinking#Video Question Answering#Causal Attribution2025년 11월 18일댓글 수 로딩 중
[논문리뷰] MathSE: Improving Multimodal Mathematical Reasoning via Self-Evolving Iterative Reflection and Reward-Guided Fine-TuningarXiv에 게시된 'MathSE: Improving Multimodal Mathematical Reasoning via Self-Evolving Iterative Reflection and Reward-Guided Fine-Tuning' 논문에 대한 자세한 리뷰입니다.#Review#Multimodal Reasoning#Mathematical Problem Solving#Self-Evolving#Iterative Fine-Tuning#Reward Models#Reflection#Large Language Models (LLMs)2025년 11월 12일댓글 수 로딩 중
[논문리뷰] Towards Mitigating Hallucinations in Large Vision-Language Models by Refining Textual EmbeddingsJiaxin Yuan이 arXiv에 게시한 'Towards Mitigating Hallucinations in Large Vision-Language Models by Refining Textual Embeddings' 논문에 대한 자세한 리뷰입니다.#Review#Hallucination Mitigation#Large Vision-Language Models#Textual Embeddings#Multimodal Reasoning#Attention Mechanism#Visual Grounding#Modality Imbalance2025년 11월 9일댓글 수 로딩 중
[논문리뷰] DeepEyesV2: Toward Agentic Multimodal ModelGuohai Xu이 arXiv에 게시한 'DeepEyesV2: Toward Agentic Multimodal Model' 논문에 대한 자세한 리뷰입니다.#Review#Agentic AI#Multimodal Models#Tool Use#Reinforcement Learning#Supervised Fine-tuning#Multimodal Reasoning#Web Search#Code Execution2025년 11월 9일댓글 수 로딩 중
[논문리뷰] Thinking with Video: Video Generation as a Promising Multimodal Reasoning ParadigmarXiv에 게시된 'Thinking with Video: Video Generation as a Promising Multimodal Reasoning Paradigm' 논문에 대한 자세한 리뷰입니다.#Review#Video Generation#Multimodal Reasoning#Temporal Understanding#Spatial Reasoning#Foundation Models#AI Benchmarking#In-Context Learning#Self-Consistency2025년 11월 9일댓글 수 로딩 중
[논문리뷰] SeeingEye: Agentic Information Flow Unlocks Multimodal Reasoning In Text-only LLMsJiaxuan You이 arXiv에 게시한 'SeeingEye: Agentic Information Flow Unlocks Multimodal Reasoning In Text-only LLMs' 논문에 대한 자세한 리뷰입니다.#Review#Multimodal Reasoning#Text-only LLM#Agentic AI#Information Flow#VQA#Structured Intermediate Representation#Decoupled Architecture#Tool Use2025년 10월 30일댓글 수 로딩 중
[논문리뷰] VR-Thinker: Boosting Video Reward Models through Thinking-with-Image ReasoningarXiv에 게시된 'VR-Thinker: Boosting Video Reward Models through Thinking-with-Image Reasoning' 논문에 대한 자세한 리뷰입니다.#Review#Video Reward Models#Multimodal Reasoning#Thinking-with-Image#Visual Reasoning#Reinforcement Learning#Chain-of-Thought#Context Management2025년 10월 17일댓글 수 로딩 중
[논문리뷰] MathCanvas: Intrinsic Visual Chain-of-Thought for Multimodal Mathematical ReasoningKe Wang이 arXiv에 게시한 'MathCanvas: Intrinsic Visual Chain-of-Thought for Multimodal Mathematical Reasoning' 논문에 대한 자세한 리뷰입니다.#Review#Multimodal Reasoning#Visual Chain-of-Thought (VCoT)#Large Multimodal Models (LMMs)#Geometric Reasoning#Diagram Generation#Dataset#Benchmark2025년 10월 17일댓글 수 로딩 중
[논문리뷰] ARES: Multimodal Adaptive Reasoning via Difficulty-Aware Token-Level Entropy ShapingWenbo Hu이 arXiv에 게시한 'ARES: Multimodal Adaptive Reasoning via Difficulty-Aware Token-Level Entropy Shaping' 논문에 대한 자세한 리뷰입니다.#Review#Multimodal Reasoning#Adaptive Learning#Reinforcement Learning#Entropy Shaping#Difficulty-Aware#Chain-of-Thought#Token-Level Analysis2025년 10월 13일댓글 수 로딩 중
[논문리뷰] Factuality Matters: When Image Generation and Editing Meet Structured VisualsBoxiang Qiu이 arXiv에 게시한 'Factuality Matters: When Image Generation and Editing Meet Structured Visuals' 논문에 대한 자세한 리뷰입니다.#Review#Structured Visuals#Image Generation#Image Editing#Multimodal Reasoning#Factual Fidelity#Chain-of-Thought#Evaluation Benchmark#Diffusion Models2025년 10월 7일댓글 수 로딩 중
[논문리뷰] Training Vision-Language Process Reward Models for Test-Time Scaling in Multimodal Reasoning: Key Insights and Lessons LearnedarXiv에 게시된 'Training Vision-Language Process Reward Models for Test-Time Scaling in Multimodal Reasoning: Key Insights and Lessons Learned' 논문에 대한 자세한 리뷰입니다.#Review#Vision-Language Models (VLMs)#Process Reward Models (PRMs)#Multimodal Reasoning#Test-Time Scaling (TTS)#Process Supervision#Dataset Construction#Perception Errors#MCTS2025년 10월 2일댓글 수 로딩 중
[논문리뷰] More Thought, Less Accuracy? On the Dual Nature of Reasoning in Vision-Language ModelsFabian Waschkowski이 arXiv에 게시한 'More Thought, Less Accuracy? On the Dual Nature of Reasoning in Vision-Language Models' 논문에 대한 자세한 리뷰입니다.#Review#Vision-Language Models#Multimodal Reasoning#Reasoning#Visual Forgetting#Perceptual Grounding#Reinforcement Learning#Policy Optimization#Visual Anchors2025년 10월 1일댓글 수 로딩 중
[논문리뷰] MMR1: Enhancing Multimodal Reasoning with Variance-Aware Sampling and Open ResourcesJing Wang이 arXiv에 게시한 'MMR1: Enhancing Multimodal Reasoning with Variance-Aware Sampling and Open Resources' 논문에 대한 자세한 리뷰입니다.#Review#Multimodal Reasoning#Reinforcement Learning#Variance-Aware Sampling#Gradient Vanishing#Data Curation#Chain-of-Thought#GRPO2025년 9월 26일댓글 수 로딩 중
[논문리뷰] MAPO: Mixed Advantage Policy OptimizationXuankun Rong이 arXiv에 게시한 'MAPO: Mixed Advantage Policy Optimization' 논문에 대한 자세한 리뷰입니다.#Review#Reinforcement Learning#Foundation Models#Policy Optimization#Advantage Function#Trajectory Certainty#Multimodal Reasoning#GRPO2025년 9월 24일댓글 수 로딩 중
[논문리뷰] AuditoryBench++: Can Language Models Understand Auditory Knowledge without Hearing?Jaeho Lee이 arXiv에 게시한 'AuditoryBench++: Can Language Models Understand Auditory Knowledge without Hearing?' 논문에 대한 자세한 리뷰입니다.#Review#Auditory Knowledge#Large Language Models#Multimodal Reasoning#Benchmark#Chain-of-Thought#Auditory Imagination#Text-only Reasoning2025년 9월 23일댓글 수 로딩 중
[논문리뷰] MARS2 2025 Challenge on Multimodal Reasoning: Datasets, Methods, Results, Discussion, and OutlookBowen Zhou이 arXiv에 게시한 'MARS2 2025 Challenge on Multimodal Reasoning: Datasets, Methods, Results, Discussion, and Outlook' 논문에 대한 자세한 리뷰입니다.#Review#Multimodal Reasoning#Large Language Models (LLMs)#Multimodal Large Language Models (MLLMs)#Visual Grounding#Visual Question Answering#Advertisement Video Analysis#Real-world Scenarios#Challenge Benchmark2025년 9월 18일댓글 수 로딩 중
[논문리뷰] Multimodal Reasoning for Science: Technical Report and 1st Place Solution to the ICML 2025 SeePhys ChallengeWentao Zhang이 arXiv에 게시한 'Multimodal Reasoning for Science: Technical Report and 1st Place Solution to the ICML 2025 SeePhys Challenge' 논문에 대한 자세한 리뷰입니다.#Review#Multimodal Reasoning#Science AI#Caption-assisted Reasoning#SeePhys Challenge#Large Language Models#Visual Question Answering#Physics Problems#Cross-modal Alignment2025년 9월 17일댓글 수 로딩 중
[논문리뷰] D-HUMOR: Dark Humor Understanding via Multimodal Open-ended ReasoningDhanvin Sanjay Namboodiri이 arXiv에 게시한 'D-HUMOR: Dark Humor Understanding via Multimodal Open-ended Reasoning' 논문에 대한 자세한 리뷰입니다.#Review#Dark Humor Detection#Multimodal Reasoning#Vision-Language Models (VLMs)#Iterative Reasoning Refinement#Meme Analysis#Content Moderation#Cross-Modal Attention#Dataset Annotation2025년 9월 9일댓글 수 로딩 중
[논문리뷰] LLaVA-Critic-R1: Your Critic Model is Secretly a Strong Policy ModelJianwei Yang이 arXiv에 게시한 'LLaVA-Critic-R1: Your Critic Model is Secretly a Strong Policy Model' 논문에 대한 자세한 리뷰입니다.#Review#Vision-Language Models (VLMs)#Critic Models#Policy Models#Reinforcement Learning (RL)#Self-Criticism#Multimodal Reasoning#Preference Learning#Generative Models2025년 9월 3일댓글 수 로딩 중
[논문리뷰] InstructVLA: Vision-Language-Action Instruction Tuning from Understanding to ManipulationYang Tian이 arXiv에 게시한 'InstructVLA: Vision-Language-Action Instruction Tuning from Understanding to Manipulation' 논문에 대한 자세한 리뷰입니다.#Review#Vision-Language-Action (VLA)#Instruction Tuning#Multimodal Reasoning#Robotic Manipulation#Catastrophic Forgetting#Mixture-of-Experts (MoE)#Flow Matching2025년 8월 5일댓글 수 로딩 중