#Geometric Reasoning

5개의 포스트

[논문리뷰] Revisiting Articulated Parts Perception in Robot Manipulation

본 연구는 기존의 로봇 조작 연구들이 정적인 객체 인식에 편중되어, 관절형 객체의 복잡한 기구학적 특성을 충분히 반영하지 못하고 있다는 점을 해결하고자 한다.

#Review #Articulated Parts #Robot Manipulation #Part Segmentation #Motion Estimation #Geometric Reasoning

2026년 6월 11일

[논문리뷰] PAGER: Bridging the Semantic-Execution Gap in Point-Precise Geometric GUI Control

본 연구는 기존 GUI 에이전트들이 주로 의존하는 'region-tolerant' 패러다임이 정밀한 기하학적 구성 작업에서 실패하는 근본적인 문제를 해결하고자 한다.

#Review #GUI Agents #Geometric Reasoning #Precision-Sensitive #Dependency-Structured Planning #Pixel-Grounded Supervised Tuning #Reinforcement Learning #Semantic-Execution Gap

2026년 5월 17일

[논문리뷰] Figure It Out: Improving the Frontier of Reasoning with Active Visual Thinking

본 논문은 텍스트 전용 추론 모델이 암묵적인 공간 및 기하학적 관계를 파악하는 데 어려움을 겪는 복잡한 추론 문제의 한계를 해결하고자 합니다.

#Review #Multimodal Reasoning #Visual Thinking #Reinforcement Learning #Code Generation #Geometric Reasoning #Adaptive Reward Mechanism #Problem Solving

2025년 12월 31일

[논문리뷰] GeoPQA: Bridging the Visual Perception Gap in MLLMs for Geometric Reasoning

본 논문은 멀티모달 대규모 언어 모델(MLLM)이 기하학적 추론과 같은 시각 집중 태스크에서 자주 발생하는 환각 현상 과 부정확한 추론 문제를 해결하고자 합니다. 이러한 문제의 근본 원인인 MLLM의 시각적 인지 병목 현상 을 정량화하고, 이를 극복하여 추론 훈련의 효과를 극대화하는 것을 목표로 합니다.

#Review #Multimodal Large Language Models (MLLMs)#Geometric Reasoning #Visual Perception #Reinforcement Learning (RL)#Two-stage Training #GeoPQA Benchmark #Perceptual Bottleneck

2025년 9월 23일

[논문리뷰] MathCanvas: Intrinsic Visual Chain-of-Thought for Multimodal Mathematical Reasoning

본 논문은 대규모 언어 모델(LLM)이 시각적 보조 자료에 본질적으로 의존하는 기하학 등 수학적 문제에서 겪는 어려움을 해결하는 것을 목표로 합니다.

#Review #Multimodal Reasoning #Visual Chain-of-Thought (VCoT)#Large Multimodal Models (LMMs)#Geometric Reasoning #Diagram Generation #Dataset #Benchmark

2025년 10월 17일