#VLM Agents

4개의 포스트

[논문리뷰] OmniGameArena: A Unified UE5 Benchmark for VLM Game Agents with Improvement Dynamics

본 논문은 기존 VLM Agent 벤치마크가 단일 시도(First-attempt) 점수만을 보고하고, Solo 플레이 위주로 구성되어 있어 에이전트의 학습 및 개선 능력을 측정하지 못한다는 문제를 지적한다.

#Review #VLM Agents #Benchmark #Unreal Engine 5 #Improvement Dynamics #Agentic Reflection #Cold-start #Generalization

2026년 6월 8일

[논문리뷰] SpatialAct: Probing Spatial Reasoning-to-Action Capabilities of VLM Agents in 3D Scenes

본 논문은 VLM이 단순한 공간 관찰을 넘어 실제 3D 환경에서 행동하고 그 결과를 관리할 수 있는지 평가하기 위해 SpatialAct를 제안한다. 기존의 공간 추론 벤치마크들은 대부분 정적인 이미지나 비디오를 대상으로 모델의 이해도만을 측정하며, 모델의 출력이 환경을 변화시키는 상호작용은 고려하지 않았다 .

#Review #VLM Agents #3D Spatial Reasoning #Action-Conditioned #Interactive Refinement #Benchmark #Simulator-Grounded

2026년 6월 3일

[논문리뷰] AtlasVA: Self-Evolving Visual Skill Memory for Teacher-Free VLM Agents

본 연구는 기존 VLM 에이전트가 긴 호흡의 공간적 과업(long-horizon spatial tasks)을 수행할 때 발생하는 '공간적 맹목(spatial blindness)'과 '모달리티 불일치(modality mismatch)' 문제를 해결합니다.

#Review #VLM Agents #Visual Skill Memory #Reinforcement Learning #Reward Shaping #Spatial Reasoning #Self-Evolving

2026년 5월 18일

[논문리뷰] Think3D: Thinking with Space for Spatial Reasoning

기존 Vision-Language Models (VLMs) 이 2D 인식을 넘어선 진정한 3D 공간 추론 능력 과 일관된 공간 표현을 구축하는 데 한계가 있음을 해결하고자 합니다.

#Review #Spatial Reasoning #3D Reconstruction #VLM Agents #Tool Calling #Reinforcement Learning #Novel View Synthesis #Iterative Exploration

2026년 1월 20일