#Multimodal Control

7개의 포스트

[논문리뷰] OmniDirector: General Multi-Shot Camera Cloning without Cross-Paired Data

본 논문은 기존 비디오 생성 모델의 카메라 제어 방식이 지닌 정밀도 한계와 데이터 희소성 문제를 해결하기 위해 OmniDirector를 제안합니다.

#Review #Video Generation #Camera Control #Multi-shot Cloning #Diffusion Transformers #Camera Grid #Multimodal Control #Prompt Expansion

2026년 6월 14일

[논문리뷰] Making Avatars Interact: Towards Text-Driven Human-Object Interaction for Controllable Talking Avatars

본 논문은 기존 토킹 아바타 기술의 한계인 환경 인지 및 텍스트 기반 객체 상호작용 능력 부재 문제를 해결하고자 합니다.

#Review #Talking Avatars #Human-Object Interaction (HOI)#Text-Driven Generation #Diffusion Models #Multimodal Control #Grounded Interaction

2026년 2월 2일

[논문리뷰] FlowAct-R1: Towards Interactive Humanoid Video Generation

본 논문은 실시간 상호작용이 가능한 휴머노이드 비디오 생성을 목표로 하며, 기존 비디오 합성 방법론이 고품질 합성 및 실시간 상호작용 요구사항 사이에서 겪는 한계를 극복하고자 합니다. 특히, 연속적이고 반응적인 방식으로 인간과 상호작용할 수 있는 생체와 같은 시각적 에이전트를 합성하는 것을 주된 연구 목적으로 합니다.

#Review #Interactive Video Generation #Humanoid Synthesis #Real-time #Streaming Diffusion #MMDiT #Temporal Consistency #Multimodal Control #Low Latency

2026년 1월 15일

[논문리뷰] The World is Your Canvas: Painting Promptable Events with Reference Images, Trajectories, and Text

본 논문은 기존 텍스트 전용 또는 궤적 기반 이미지-투-비디오(I2V) 생성 모델의 한계를 극복하고, 더욱 풍부하고 사용자 지향적인 '프롬프트 가능한 월드 이벤트' 시뮬레이션을 가능하게 하는 것을 목표로 합니다.

#Review #World Models #Video Generation #Multimodal Control #Trajectory Guidance #Reference Images #Promptable Events #Cross-Attention #Diffusion Models

2025년 12월 18일

[논문리뷰] Canvas-to-Image: Compositional Image Generation with Multimodal Controls

본 연구는 최신 확산 모델이 텍스트 프롬프트, 객체 참조, 공간 배치, 포즈 제약, 레이아웃 주석 등 다양한 유형의 제어 신호를 동시에 처리할 때 발생하는 제한적인 합성 능력과 낮은 충실도 문제를 해결하는 것을 목표로 합니다.

#Review #Image Generation #Diffusion Models #Compositional Control #Multimodal Control #Unified Canvas #Multi-Task Learning #Personalization

2025년 11월 27일

[논문리뷰] Wan-S2V: Audio-Driven Cinematic Video Generation

본 연구는 기존 오디오 기반 캐릭터 애니메이션 모델이 복잡한 영화 및 TV 프로덕션 시나리오(미묘한 상호작용, 현실적인 신체 움직임, 다이내믹한 카메라 워크)에서 한계를 보이는 문제를 해결합니다.

#Review #Audio-Driven Video Generation #Cinematic Video #Diffusion Models #Transformer Architecture #Long Video Consistency #Human Animation #Multimodal Control #Data Curation

2025년 8월 27일

[논문리뷰] DriveGen3D: Boosting Feed-Forward Driving Scene Generation with Efficient Video Diffusion

본 논문은 기존 방법론의 한계인 긴 시간 동안의 비디오 생성에 대한 과도한 계산 요구, 3D 표현 없는 장기 비디오 합성 집중, 또는 정적 단일 장면 재구성에 대한 제약을 해결합니다.

#Review #Driving Scene Generation #Video Diffusion #3D Reconstruction #Gaussian Splatting #Feed-Forward Models #Temporal Coherence #Multimodal Control

2025년 10월 20일