#Streaming Video

4개의 포스트

[논문리뷰] Stream3D-VLM: Online 3D Spatial Understanding with Incremental Geometry Priors

본 논문은 기존 3D LMM들이 전체 장면 관측이나 미리 정의된 비디오 클립을 요구하는 오프라인 방식으로 운영되어 실시간 환경 적용에 한계가 있다는 문제점을 해결하고자 합니다 . 이러한 방식은 자율 로봇이나 AR/VR 기기와 같이 실시간 상호작용이 필수적인 임베디드 애플리케이션에서 사용하기 어렵습니다.

#Review #3D Large Multimodal Models #Online Spatial Understanding #Incremental Geometry Priors #Visual-Spatial Feature Integration #Geometry-Adaptive Voxel Compression #Streaming Video

2026년 6월 7일

[논문리뷰] VideoMLA: Low-Rank Latent KV Cache for Minute-Scale Autoregressive Video Diffusion

본 논문은 autoregressive 영상 확산 모델에서 streaming 생성 시 발생하는 방대한 KV 캐시 메모리 비용 문제를 해결하고자 합니다.

#Review #Video Diffusion #Multi-Head Latent Attention #KV Cache #Autoregressive Generation #Low-Rank Latent #Streaming Video #3D-RoPE

2026년 6월 1일

[논문리뷰] Spatial-TTT: Streaming Visual-based Spatial Intelligence with Test-Time Training

인간은 시각적 관찰 스트림을 통해 실제 공간을 인지하고 이해하므로, 잠재적으로 무한한 비디오 스트림에서 Spatial Evidence 를 스트리밍 방식으로 유지하고 업데이트하는 능력은 Spatial Intelligence 에 필수적입니다.

#Review #Spatial Intelligence #Test-Time Training #MLLM #Streaming Video #Hybrid Architecture #Spatiotemporal Convolution

2026년 3월 12일

[논문리뷰] Autoregressive Universal Video Segmentation Model

현재 단편화된 비디오 분할 태스크들을 단일 아키텍처 로 통합하고, 프롬프트 기반(prompted) 및 비프롬프트 기반(unprompted) 비디오 분할을 아우르는 범용 모델을 개발하는 것이 목표입니다.

#Review #Video Segmentation #Autoregressive Model #Universal Model #State Space Models #Mamba #Parallel Training #Streaming Video #Deep Learning

2025년 8월 27일