#Human-Robot Interaction

8개의 포스트

[논문리뷰] EgoActor: Grounding Task Planning into Spatial-aware Egocentric Actions for Humanoid Robots via Visual-Language Models

본 논문은 인간형 로봇의 실제 환경 배포 시 발생하는 고유한 불안정성, 부분적 정보 기반의 지각/이동/조작 통합의 어려움, 그리고 동적 환경에서의 견고한 하위 태스크 전환 문제를 해결하는 것을 목표로 합니다.

#Review #Humanoid Robots #Vision-Language Models #Task Planning #Egocentric Control #Mobile Manipulation #Active Perception #Human-Robot Interaction #Real-World Deployment

2026년 2월 4일

[논문리뷰] VL-LN Bench: Towards Long-horizon Goal-oriented Navigation with Active Dialogs

이 논문은 에이전트가 모호한 자연어 지시를 받아 복잡하고 장거리인 환경에서 특정 객체 인스턴스를 찾아내는 Interactive Instance Object Navigation (IION) 태스크를 도입합니다.

#Review #Embodied AI #Vision and Language Navigation #Instance Object Navigation #Active Dialog #Large Language Models (LLMs)#Benchmark #Human-Robot Interaction

2025년 12월 29일

[논문리뷰] PhysBrain: Human Egocentric Data as a Bridge from Vision Language Models to Physical Intelligence

본 연구는 시점 불일치 문제로 인해 로봇 일반화에 한계가 있는 기존 VLM(Vision-Language Model)의 단점을 해결하고자 합니다.

#Review #Egocentric Data #Physical Intelligence #VLM #Robot Control #Embodied AI #VQA Supervision #Human-Robot Interaction #Zero-shot Transfer

2025년 12월 21일

[논문리뷰] An Anatomy of Vision-Language-Action Models: From Modules to Milestones and Challenges

본 논문은 급변하는 Vision-Language-Action (VLA) 모델 분야에 대한 명확하고 구조화된 가이드를 제공하는 것을 목표로 합니다.

#Review #Vision-Language-Action Models #Embodied Intelligence #Robotics #Foundation Models #Multi-modal Learning #Reinforcement Learning #Sim-to-Real Transfer #Human-Robot Interaction

2025년 12월 21일

[논문리뷰] LEO-RobotAgent: A General-purpose Robotic Agent for Language-driven Embodied Operator

본 논문은 다양한 유형의 로봇이 예측 불가능한 복잡한 작업을 수행할 수 있도록 하는 일반 목적의 언어 기반 지능형 로봇 에이전트 프레임워크인 LEO-RobotAgent를 제안합니다.

#Review #Robotic Agent #Large Language Models (LLMs)#Embodied AI #Task Planning #Human-Robot Interaction #General-purpose Robotics #ROS

2025년 12월 14일

[논문리뷰] Ask-to-Clarify: Resolving Instruction Ambiguity through Multi-turn Dialogue

현재 VLA(Vision-Language-Action) 기반 로봇 이 모호한 지시를 처리하지 못하고 수동적으로 명령을 실행하는 한계를 해결하는 것이 목표입니다.

#Review #Embodied AI #Human-Robot Interaction #Multi-turn Dialogue #Instruction Following #Vision-Language Models #Diffusion Models #Ambiguity Resolution #Low-level Actions

2025년 9월 22일

[논문리뷰] Do What? Teaching Vision-Language-Action Models to Reject the Impossible

본 논문은 Vision-Language-Action (VLA) 모델이 존재하지 않는 객체나 조건('false-premise instructions')을 참조하는 명령을 받았을 때 이를 인식하고, 해석하며, 적절히 응답하는 능력이 부족하다는 문제를 해결하는 것을 목표로 합니다.

#Review #Vision-Language-Action Models #Robotics #False Premise Detection #Instruction Following #Human-Robot Interaction #Clarification #Instruction Tuning

2025년 8월 25일

[논문리뷰] VITA-E: Natural Embodied Interaction with Concurrent Seeing, Hearing, Speaking, and Acting

기존 VLM 기반 로봇 시스템의 고정적이고 비동시적인 상호작용 패러다임이 유연한 인간-로봇 협력을 저해하는 문제를 해결하는 것을 목표로 합니다. 로봇이 인간처럼 동시에 보고, 듣고, 말하고, 행동하며 실시간 사용자 개입에 동적으로 반응할 수 있는 프레임워크를 구축하고자 합니다.

#Review #Embodied AI #Human-Robot Interaction #Vision-Language Models #Concurrency #Interruption #Robotics Control #Dual-Model Architecture #Special Tokens

2025년 10월 28일