#Real-time Inference

9개의 포스트

[논문리뷰] SANA-Streaming: Real-time Streaming Video Editing with Hybrid Diffusion Transformer

본 논문은 실시간 스트리밍 Video-to-Video(V2V) 편집에서 발생하는 시간적 일관성 유지와 추론 성능 제한 문제를 해결하기 위해 SANA-Streaming을 제안한다.

#Review #Diffusion Transformer #Streaming Video Editing #Hybrid Architecture #Cycle-Reverse Regularization #Mixed-Precision Quantization #Real-time Inference

2026년 5월 31일

[논문리뷰] minWM: A Full-Stack Open-Source Framework for Real-Time Interactive Video World Models

본 논문은 기존의 고품질 Video Foundation Model을 실시간 상호작용이 가능한 Interactive World Model로 전환하는 파이프라인의 부재 문제를 해결합니다.

#Review #Video World Models #Diffusion Models #Autoregressive #Distillation #Real-time Inference #Camera Control

2026년 5월 28일

[논문리뷰] WorldKV: Efficient World Memory with World Retrieval and Compression

본 논문은 Autoregressive 비디오 모델에서 실시간성을 유지하면서도 공간적·시간적 일관성을 갖춘 장기 기억(Long-term memory)을 구현하는 문제를 해결하고자 합니다.

#Review #World Models #Autoregressive Video Diffusion #KV Cache Management #World Retrieval #World Compression #Real-time Inference #Long-term Consistency

2026년 5월 21일

[논문리뷰] StreamingClaw Technical Report

Embodied Intelligence, AI Hardware, Autonomous Driving, Intelligent Cockpits와 같은 Applications은 Real-time Perception–Decision–Action Closed Loop에 크게 의존하며, 이는 Real-time Streaming Video Understanding에 대한 엄격한 요구사항을 부과한다.

#Review #Streaming Video Understanding #Embodied Intelligence #Multi-agent Systems #Long-term Memory #Proactive Interaction #Real-time Inference #OpenClaw

2026년 3월 25일

[논문리뷰] Mobile-O: Unified Multimodal Understanding and Generation on Mobile Device

본 논문은 기존의 멀티모달 모델들이 데이터 학습량 이 많고 배포에 필요한 리소스 가 커서 엣지 디바이스에 적용하기 어렵다는 문제점을 해결하고자 합니다. 통합된 멀티모달 아키텍처 를 통해 시각적 이해와 생성을 동시에 수행하면서, 모바일 기기에서 실시간 추론 이 가능하도록 효율적인 모델 을 구축하는 것을 목표로 합니다.

#Review #Multimodal AI #Vision-Language Models #Diffusion Models #Mobile Devices #Edge Computing #Model Efficiency #Unified Architecture #Real-time Inference

2026년 2월 23일

[논문리뷰] Proactive Hearing Assistants that Isolate Egocentric Conversations

본 논문은 사용자의 명시적인 프롬프트 없이도 대화 상대를 자동으로 식별하고 분리하여 다른 방해 음성을 억제하는 선제적(proactive) 보청 보조 장치 를 개발하는 것을 목표로 합니다. 이는 복잡한 다자간 대화 환경에서 실시간으로 작동하며, 착용자의 자율적인 대화 참여를 지원하는 데 중점을 둡니다.

#Review #Proactive Hearing Assistant #Egocentric Audio Processing #Speech Separation #Turn-taking Dynamics #Dual-Model Architecture #Real-time Inference #Wearable Devices #Dialogue Modeling

2025년 11월 18일

[논문리뷰] Lumine: An Open Recipe for Building Generalist Agents in 3D Open Worlds

논문은 복잡한 3D 오픈 월드 환경 에서 인간 수준의 효율성으로 수 시간 길이의 미션을 실시간으로 완수할 수 있는 제너럴리스트 에이전트 를 구축하기 위한 '오픈 레시피'인 Lumine을 제시합니다.

#Review #Generalist Agent #3D Open World #Vision-Language Model #Imitation Learning #Real-time Inference #Hybrid Thinking #Action Chunking #Genshin Impact

2025년 11월 12일

[논문리뷰] Snap-Snap: Taking Two Images to Reconstruct 3D Human Gaussians in Milliseconds

본 연구는 극도로 희소한 입력(전면 및 후면 이미지 단 두 장)만으로 3D 인체 가우시안을 재구성하는 도전적인 문제를 해결하고자 합니다. 기존 방법론의 고비용 데이터 수집 및 긴 처리 시간의 한계를 극복하고, 사용자 친화적인 방식으로 디지털 휴먼 생성을 위한 문턱을 낮추는 것을 목표로 합니다.

#Review #3D Human Reconstruction #Gaussian Splatting #Sparse View #Two-Image Input #Real-time Inference #Point Cloud Prediction #Feed-forward Network

2025년 8월 22일

[논문리뷰] Thai Semantic End-of-Turn Detection for Real-Time Voice Agents

이 논문은 실시간 음성 에이전트를 위한 태국어 텍스트 전용 EOT(End-of-Turn) 감지 에 대한 최초의 체계적인 연구를 수행하는 것을 목표로 합니다.

#Review #End-of-Turn Detection #Thai NLP #Voice Agents #Real-time Inference #Transformer Models #Few-shot Learning #Fine-tuning #Latency Optimization

2025년 10월 7일