#Real-time AI

9개의 포스트

[논문리뷰] OmniInteract: Benchmarking Real-World Streaming Interaction for Real-Time Omnimodal Assistants

본 논문은 오디오-비주얼 스트리밍 환경에서 Omnimodal Large Language Models가 실시간으로 상호작용하는 능력을 평가하는 데 있어 기존 벤치마크들의 한계를 해결하고자 합니다.

#Review #Omnimodal LLM #Streaming Interaction #Benchmark #Real-time AI #Full-duplex #Interaction-Aware Scoring

2026년 5월 28일

[논문리뷰] Video Streaming Thinking: VideoLLMs Can Watch and Think Simultaneously

온라인 Video Large Language Models (VideoLLMs) 는 스트리밍 시각 입력(streaming visual inputs)을 해석하고 실시간으로 응답하는 데 필수적이며, 특히 Embodied Intelligence와 상호작용형 AI 어시스턴트에서 중요하다.

#Review #Streaming Video Understanding #VideoLLMs #Chain-of-Thought (CoT)#Real-time AI #Reinforcement Learning #Knowledge Graphs #Streaming Thinking #Low Latency

2026년 3월 15일

[논문리뷰] Planning in 8 Tokens: A Compact Discrete Tokenizer for Latent World Model

본 논문은 기존 월드 모델의 수백 개의 잠재 토큰 이 실시간 계획 수립에 필요한 계산 비용을 과도하게 증가시키는 문제를 해결하고자 합니다.

#Review #World Model #Discrete Tokenizer #Latent Representation #Action Planning #Model Predictive Control #Real-time AI #Compression #Vision Foundation Model

2026년 3월 8일

[논문리뷰] MIBURI: Towards Expressive Interactive Gesture Synthesis

본 논문은 현재 대규모 언어 모델(LLM) 기반 대화형 에이전트가 부족한 신체 움직임 및 표현력 있는 제스처를 보완하고자 합니다.

#Review #Embodied Conversational Agents #Gesture Synthesis #Real-time AI #Causal Models #Transformer Networks #Residual VQ-VAE #Speech-text Foundation Models

2026년 3월 4일

[논문리뷰] TimeBill: Time-Budgeted Inference for Large Language Models

시간 제약이 있는 시스템(예: 로봇 공학, 자율 주행)에서 대규모 언어 모델(LLM)의 응답 성능을 유지하면서 주어진 시간 예산 내에 추론을 완료하는 문제를 해결하는 것이 목표입니다.

#Review #LLM Inference #Time Budgeting #KV Cache Eviction #Response Length Prediction #Execution Time Estimation #Real-time AI #Performance Optimization

2025년 12월 28일

[논문리뷰] PersonaLive! Expressive Portrait Image Animation for Live Streaming

기존 확산 모델 기반 초상화 애니메이션이 시각적 품질과 표현 사실성에 중점을 두어 높은 계산 비용 과 지연 시간 으로 인해 라이브 스트리밍에 부적합하다는 문제를 해결하고자 합니다.

#Review #Live Streaming #Portrait Animation #Diffusion Models #Real-time AI #Appearance Distillation #Micro-chunk Streaming #Motion Control #Low Latency

2025년 12월 14일

[논문리뷰] TUN3D: Towards Real-World Scene Understanding from Unposed Images

본 논문은 실세계 스캔에서 정확한 카메라 포즈나 깊이 정보 없이 다중 뷰 이미지 입력만으로 조인트 레이아웃 추정(layout estimation) 과 3D 객체 감지(3D object detection) 를 수행하는 최초의 방법론인 TUN3D 를 제시합니다.

#Review #3D Scene Understanding #Layout Estimation #3D Object Detection #Unposed Images #Sparse Convolutional Networks #Multi-view Stereo #Real-time AI

2025년 9월 29일

[논문리뷰] MotionFlux: Efficient Text-Guided Motion Generation through Rectified Flow Matching and Preference Alignment

본 논문은 기존 텍스트 기반 모션 생성 방법론이 겪는 언어적 설명과 모션 의미 간의 부정확한 정렬 및 느리고 비효율적인 다단계 추론 과정 의 문제를 해결하고자 합니다. 궁극적으로 강력한 의미론적 정렬, 고품질 모션 생성, 그리고 실시간 합성을 가능하게 하는 프레임워크를 개발하는 것이 목표입니다.

#Review #Text-Guided Motion Generation #Rectified Flow Matching #Preference Alignment #Human Motion Synthesis #Real-time AI #Transformer Architecture #Self-supervised Learning

2025년 8월 28일

[논문리뷰] Online Generic Event Boundary Detection

본 논문은 기존 오프라인(offline) GEBD(Generic Event Boundary Detection)의 한계를 극복하고, 인간의 인지 과정에 더 가까운 온라인 GEBD(On-GEBD) 라는 새로운 태스크를 제안합니다.

#Review #Online Video Analysis #Event Boundary Detection #Event Segmentation Theory #Real-time AI #Anomaly Detection #Transformer Architecture

2025년 10월 9일