#Streaming Inference

11개의 포스트

[논문리뷰] SwiftVR: Real-Time One-Step Generative Video Restoration

본 논문은 실시간 비디오 스트리밍 환경에서 고해상도 복원을 수행하기 위한 제너레이티브 VR 모델의 배포 문제를 해결합니다.

#Review #Generative Video Restoration #Real-time #Diffusion Transformer #Shifted-Window Attention #Streaming Inference

2026년 6월 8일

[논문리뷰] Incantation: Natural Language as the Action Interface for Multi-Entity Video World Models

본 논문은 현대적인 대화형 비디오 세계 모델들이 가진 구조적 한계인 Action Interface의 고착화 문제를 해결합니다.

#Review #Interactive Video World Model #Natural Language Action Interface #Multi-Entity Control #Cross-Entity Transfer #Streaming Inference #Self-Forcing Distillation

2026년 5월 18일

[논문리뷰] OmniHumanoid: Streaming Cross-Embodiment Video Generation with Paired-Free Adaptation

본 연구는 로봇 학습을 위한 고품질 데이터 수집의 높은 비용과 확장성 문제를 해결하기 위해, 다양한 humanoid embodiment 간의 cross-embodiment video generation을 수행하고자 합니다.

#Review #Cross-embodiment Video Generation #Diffusion Transformer #Embodiment-specific Adaptation #Streaming Inference #Paired-free Learning

2026년 5월 17일

[논문리뷰] SwiftI2V: Efficient High-Resolution Image-to-Video Generation via Conditional Segment-wise Generation

본 논문은 2K 고해상도 I2V 생성에서 발생하는 계산 효율성(Efficiency)과 입력 이미지 충실도(Fidelity) 사이의 심각한 trade-off 문제를 해결하고자 한다.

#Review #Image-to-Video #High-Resolution Generation #Diffusion Transformer #Conditional Segment-wise Generation #Efficiency #Streaming Inference

2026년 5월 7일

[논문리뷰] Qwen3.5-Omni Technical Report

본 논문은 기존 멀티모달 모델이 지닌 수동적 인식-반응 패러다임의 한계를 극복하고, 실제 환경에서 요구되는 에이전트적 행위 및 실시간 상호작용 능력을 갖춘 통합 모델을 구현하고자 합니다.

#Review #Omnimodal #Thinker-Talker Architecture #ARIA #Hybrid MoE #Streaming Inference #Audio-Visual Vibe Coding

2026년 4월 19일

[논문리뷰] Qwen3-ASR Technical Report

본 논문은 Qwen3-ASR 모델 제품군을 소개하며, 기존 ASR 모델의 한계를 넘어선 최첨단 성능과 효율성을 제공하는 것을 목표로 합니다.

#Review #ASR #Language Identification #Forced Alignment #Large Audio-Language Models #Multilingual Speech Recognition #Streaming Inference #Qwen3-Omni

2026년 1월 29일

[논문리뷰] MotionStream: Real-Time Video Generation with Interactive Motion Controls

기존 모션 제어 비디오 생성 모델의 높은 지연 시간(수분 소요) 과 비인과적 처리 문제로 인한 실시간 상호작용 불가능성을 해결하고, 대화형 모션 제어 를 통해 실시간으로 무한 길이의 비디오 스트리밍 생성 을 가능하게 하는 새로운 프레임워크를 제안하는 것입니다.

#Review #Real-Time Video Generation #Motion Control #Diffusion Models #Autoregressive Generation #Self-Forcing #Attention Sink #Streaming Inference #Video Distillation

2025년 11월 9일

[논문리뷰] LongCat-Flash-Omni Technical Report

LongCat-Flash-Omni는 560B 파라미터 규모의 최첨단 오픈소스 옴니모달 모델로, 견고한 오프라인 멀티모달 이해와 저지연 실시간 오디오-시각 상호작용 을 통합하는 것을 목표로 합니다.

#Review #Omni-modal AI #Multimodal LLM #Real-time Interaction #Mixture-of-Experts (MoE)#Streaming Inference #Distributed Training #Curriculum Learning #Audio-Visual Perception

2025년 11월 9일

[논문리뷰] EchoX: Towards Mitigating Acoustic-Semantic Gap via Echo Training for Speech-to-Speech LLMs

본 논문은 텍스트 기반 LLM에서 파생된 SLLM(Speech-to-Speech Large Language Models)이 지식 및 추론 능력에서 저하를 보이는 문제에 주목합니다.

#Review #Speech-to-Speech LLMs #Acoustic-Semantic Gap #Echo Training #Unit Language #Streaming Inference #Knowledge-based QA

2025년 9월 12일

[논문리뷰] MIDAS: Multimodal Interactive Digital-human Synthesis via Real-time Autoregressive Video Generation

본 논문은 다양한 입력 신호에 실시간으로 반응하며, 낮은 지연 시간과 높은 시각적 일관성을 유지하는 대화형 디지털 휴먼 비디오 생성 시스템 을 구축하는 것을 목표로 합니다. 기존 방식의 높은 지연 시간, 계산 비용, 제한된 제어 가능성 등의 한계를 극복하고자 합니다.

#Review #Multimodal Generation #Digital Human Synthesis #Real-time Video Generation #Autoregressive LLM #Diffusion Models #Deep Compression Autoencoder #Exposure Bias Mitigation #Streaming Inference

2025년 8월 28일

[논문리뷰] Qwen3Guard Technical Report

본 연구는 기존 가드레일 모델의 이진 분류 한계와 스트리밍 LLM 추론과의 비호환성 문제를 해결하는 것을 목표로 합니다.

#Review #LLM Safety #Guardrail Models #Multilingual AI #Real-time Moderation #Tri-class Classification #Instruction Tuning #Streaming Inference

2025년 10월 17일