#Interactive AI

14개의 포스트

[논문리뷰] From Perception to Action: An Interactive Benchmark for Vision Reasoning

기존 VLM 평가가 구조 불가지론적이고 단일 턴 질의응답(VQA)에 치중하여 동적 환경에서 기하학, 접촉, 지지 관계가 행동 가능성을 어떻게 제약하는지에 대한 에이전트의 추론 능력을 평가하지 못하는 문제를 해결하는 것이 목표입니다.

#Review #Vision-Language Models #Physical Reasoning #Interactive AI #3D Benchmark #Mechanical Puzzles #Spatial Packing #Embodied AI

2026년 2월 24일

[논문리뷰] Generated Reality: Human-centric World Simulation using Interactive Video Generation with Hand and Camera Control

본 논문은 기존 비디오 월드 모델이 가진 제한적인 제어 신호(텍스트 또는 키보드) 의 한계를 극복하고, 사용자의 머리 및 손 움직임 추적 데이터 를 활용하여 사람 중심의 인터랙티브 가상 환경 을 생성하는 것을 목표로 합니다.

#Review #Video Generation #Extended Reality (XR)#Diffusion Models #Human-Computer Interaction #Hand Pose Estimation #Camera Control #World Simulation #Interactive AI

2026년 2월 22일

[논문리뷰] Steering LLMs via Scalable Interactive Oversight

본 논문은 대규모 언어 모델(LLM)이 복잡하고 장기적인 태스크를 자동화함에 따라 발생하는 '감독 격차(supervision gap)' 문제를 해결하고자 합니다. 이는 비전문가 사용자가 충분한 도메인 전문성 없이 AI 시스템을 효과적으로 조종하고 복잡한 출력을 검증하기 어려운 문제를 지칭합니다.

#Review #Scalable Oversight #Interactive AI #Large Language Models #Human-AI Collaboration #Product Requirement Documents #Reinforcement Learning #Structured Interaction #Vibe Coding

2026년 2월 5일

[논문리뷰] VisGym: Diverse, Customizable, Scalable Environments for Multimodal Agents

본 논문은 시각적으로 풍부하고 다단계적인 인터랙티브 의사결정 태스크에서 Vision-Language Models (VLMs) 의 기능과 한계를 체계적으로 진단하고 개선하기 위한 연구를 목표로 합니다.

#Review #Multimodal Agents #Vision-Language Models (VLMs)#Interactive AI #Reinforcement Learning Environments #Benchmark #Decision-Making #Diagnostic Tools #Supervised Fine-tuning

2026년 1월 25일

[논문리뷰] VideoSSM: Autoregressive Long Video Generation with Hybrid State-Space Memory

본 논문은 AR(Autoregressive) 비디오 확산 모델의 고질적인 문제인 에러 누적, 모션 드리프트, 콘텐츠 반복 문제를 해결하여 분 단위 스케일의 장기적인 일관성 과 점진적인 동적 변화 를 동시에 유지하는 것을 목표로 합니다.

#Review #Autoregressive Video Generation #Diffusion Models #Hybrid Memory #State-Space Models (SSM)#Long Video Synthesis #Temporal Consistency #Interactive AI

2025년 12월 10일

[논문리뷰] Click2Graph: Interactive Panoptic Video Scene Graphs from a Single Click

기존 Video Scene Graph Generation (VSGG) 및 Panoptic Video Scene Graph (PVSG) 시스템의 폐쇄적인 특성과, SAM/SAM2 와 같은 프롬프트 기반 분할 모델이 의미론적 또는 관계적 추론 기능을 결여하고 있다는 한계를 해결하고자 합니다.

#Review #Panoptic Video Scene Graph Generation #Interactive AI #User Guidance #Promptable Segmentation #Video Understanding #Relational Reasoning #Human-in-the-Loop

2025년 12월 2일

[논문리뷰] Block Cascading: Training Free Acceleration of Block-Causal Video Models

블록-인과(block-causal) 비디오 생성 모델, 특히 1.3B 모델 이 16 FPS , 14B 모델 이 4.5 FPS 에 불과한 느린 추론 속도로 인해 품질-속도 간의 심각한 절충(trade-off) 문제에 직면합니다.

#Review #Video Generation #Diffusion Models #Block-Causal Models #Inference Acceleration #Multi-GPU Parallelism #Training-Free #KV Caching #Interactive AI

2025년 11월 26일

[논문리뷰] Simulating the Visual World with Artificial Intelligence: A Roadmap

본 논문은 비디오 생성 모델이 포괄적인 물리적 세계 모델(Physical World Model) 로 진화하는 과정을 체계적으로 조망하고 로드맵을 제시하는 것을 목표로 합니다.

#Review #World Models #Video Generation #AI Simulation #Generative AI #Physical Plausibility #Interactive AI #Planning #Roadmap

2025년 11월 16일

[논문리뷰] LongLive: Real-time Interactive Long Video Generation

실시간 및 대화형으로 고품질의 긴 비디오를 생성하는 데 따르는 효율성, 일관성, 그리고 시맨틱 일관성 문제를 해결하는 것을 목표로 합니다. 특히, 프롬프트 전환 시 시각적 일관성과 동적 콘텐츠 생성을 위한 상호작용성 부족이라는 기존 AR 및 Diffusion 모델의 한계를 극복하고자 합니다.

#Review #Long Video Generation #Real-time #Interactive AI #Autoregressive Models #KV Cache #Streaming Tuning #Attention Sink #Diffusion Models

2025년 9월 29일

[논문리뷰] DeepPHY: Benchmarking Agentic VLMs on Physical Reasoning

본 논문은 Vision Language Models(VLMs)이 복잡하고 동적인 물리 환경에서 정확한 행동 계획 및 공간/시간 추론 능력 에 한계를 보이는 문제를 해결하고자 합니다.

#Review #Vision Language Models (VLMs)#Agentic AI #Physical Reasoning #Benchmark #Simulation Environments #Action Planning #Interactive AI

2025년 8월 8일

[논문리뷰] InteractComp: Evaluating Search Agents With Ambiguous Queries

본 논문은 기존 검색 에이전트들이 사용자 질의를 완전하고 명확하다고 가정하지만, 실제 사용자들은 종종 불완전하고 모호한 질의로 시작하여 상호작용을 통한 명확화가 필요하다는 문제점을 제기합니다.

#Review #Search Agents #Interactive AI #Ambiguous Queries #Benchmarking #Language Agents #Information Retrieval #Overconfidence #Reinforcement Learning

2025년 10월 29일

[논문리뷰] From Masks to Worlds: A Hitchhiker's Guide to World Models

이 논문은 '진정한 월드 모델'을 구축하기 위한 명확한 로드맵을 제시하며, 단순한 모델 목록을 나열하는 것을 넘어선다.

#Review #World Models #Generative AI #Multimodal Learning #Masked Modeling #Interactive AI #Memory Systems #Autonomous Agents #AI Roadmap

2025년 10월 24일

[논문리뷰] MultiVerse: A Multi-Turn Conversation Benchmark for Evaluating Large Vision and Language Models

기존 Vision-and-Language Model (VLM) 평가 벤치마크들이 다중 턴 대화 시나리오의 깊이와 폭을 충분히 포착하지 못하는 한계를 해결하고자 합니다.

#Review #Multi-Turn Conversation #VLM Evaluation #Benchmark #Vision and Language Models #Contextual Understanding #Checklist-based Evaluation #Interactive AI

2025년 10월 21일

[논문리뷰] Dyna-Mind: Learning to Simulate from Experience for Better AI Agents

AI 에이전트가 복잡하고 장기적인 대화형 태스크에서 '대리 시행착오(vicarious trial and error)' 능력을 통해 현재의 한계를 극복하고, 환경을 mentally simulate하여 추론 및 의사결정 성능을 향상시키는 것을 목표로 합니다.

#Review #AI Agents #Reinforcement Learning #World Models #Simulation #Reasoning #Language Models #Planning #Interactive AI

2025년 10월 13일