#Multimodal Perception

6개의 포스트

[논문리뷰] Claw-Eval: Toward Trustworthy Evaluation of Autonomous Agents

본 논문은 기존 autonomous agent 벤치마크가 보유한 세 가지 핵심적인 한계점인 trajectory-opaque grading, 불충분한 안전성 및 견고성 평가, 그리고 모달리티의 제한성을 해결하기 위해 Claw-Eval 을 제안합니다.

#Review #Autonomous Agents #Benchmark #Trajectory-aware Grading #Safety Evaluation #Robustness Testing #Multimodal Perception

2026년 4월 7일

[논문리뷰] Zooming without Zooming: Region-to-Image Distillation for Fine-Grained Multimodal Perception

논문은 멀티모달 대규모 언어 모델(MLLMs)이 미세한 시각 정보를 인식하는 데 겪는 어려움, 즉 전역적 컨텍스트에 의해 중요한 세부 정보가 가려지는 문제를 해결하고자 합니다.

#Review #Multimodal Perception #Fine-Grained Analysis #Knowledge Distillation #Region-to-Image #MLLMs #ZoomBench #Reinforcement Learning

2026년 2월 15일

[논문리뷰] MobileVLA-R1: Reinforcing Vision-Language-Action for Mobile Robots

본 논문은 사족 보행 로봇의 자연어 명령을 연속적인 제어로 연결하는 데 따르는 근본적인 과제를 해결하고자 합니다.

#Review #Vision-Language-Action (VLA)#Mobile Robotics #Quadruped Robots #Chain-of-Thought (CoT)#Reinforcement Learning (RL)#Embodied AI #Multimodal Perception

2025년 11월 26일

[논문리뷰] Orion: A Unified Visual Agent for Multimodal Perception, Advanced Visual Reasoning and Execution

본 논문은 기존의 단일(monolithic) VLM(Vision-Language Model)이 가진 정밀성, 결정론적 제어 및 복합적 시각 작업 처리 능력의 한계를 극복하고자 합니다.

#Review #Visual Agent #Multimodal Perception #Tool-Augmented LLM #Agentic AI #Visual Reasoning #Computer Vision #Structured Outputs #ReAct Framework

2025년 11월 18일

[논문리뷰] 10 Open Challenges Steering the Future of Vision-Language-Action Models

본 논문은 Vision-Language-Action (VLA) 모델 의 개발과 광범위한 수용을 가속화하기 위해 현재 연구 분야에서 직면한 10가지 주요 개방형 과제를 식별하고 논의하는 것을 목표로 합니다.

#Review #Vision-Language-Action Models #Embodied AI #Robotics #Multimodal Perception #Cross-Robot Generalization #Hierarchical Planning #World Models #Robot Safety

2025년 11월 10일

[논문리뷰] Visual Jigsaw Post-Training Improves MLLMs

본 논문은 기존 MLLM(Multimodal Large Language Models)의 텍스트 중심 후속 훈련 패러다임이 시각 신호에 대한 세밀한 이해를 과소평가한다는 문제점을 해결하고자 합니다.

#Review #MLLMs #Post-training #Self-supervised Learning #Visual Understanding #Jigsaw Puzzles #RLVR #Multimodal Perception #Spatial Reasoning

2025년 9월 30일