#Controllability

9개의 포스트

[논문리뷰] Seedance 2.0: Advancing Video Generation for World Complexity

본 연구는 기존 비디오 생성 모델이 가졌던 짧은 클립 생성 위주의 한계와 제한적인 제어 능력을 극복하고, 복잡한 실세계 시나리오에 대응하는 강력하고 조작 가능한(controllable) 비디오 합성 기술을 구현하는 데 목적이 있습니다.

#Review #Video Generation #Multimodal Foundation Model #Audio-Video Joint Generation #Controllability #Generative AI #Real-world Complexity

2026년 4월 15일

[논문리뷰] How Controllable Are Large Language Models? A Unified Evaluation across Behavioral Granularities

본 연구는 사회적으로 민감한 영역에 배포되는 대규모 언어 모델(LLMs) 의 예측 불가능한 행동(예: 의도 불일치, 일관성 없는 성격 표현)이 초래하는 상당한 위험을 해결하고자 합니다.

#Review #Large Language Models (LLMs)#Controllability #Hierarchical Benchmark #Behavioral Granularity #Model Steering #Prompt Engineering #Activation-based Steering

2026년 3월 3일

[논문리뷰] Focal Guidance: Unlocking Controllability from Semantic-Weak Layers in Video Diffusion Models

본 논문은 Diffusion Transformer (DiT) 기반의 Image-to-Video (I2V) 모델에서 텍스트 프롬프트에 대한 제어력 부족 문제를 해결하고자 합니다.

#Review #Video Diffusion Models #Image-to-Video Generation #Diffusion Transformers (DiT)#Controllability #Semantic Alignment #Focal Guidance #Prompt Adherence

2026년 1월 14일

[논문리뷰] GenCtrl -- A Formal Controllability Toolkit for Generative Models

본 연구는 생성 모델의 제어 가능성(controllability)이 암묵적으로 가정되는 현 상황을 비판하며, 모델이 실제로 얼마나 제어 가능한지에 대한 이론적 프레임워크 를 제공하는 것을 목표로 합니다.

#Review #Generative Models #Controllability #Reachability #Control Theory #Dialogue Systems #LLMs #T2IMs #PAC Bounds #Formal Verification

2026년 1월 11일

[논문리뷰] TV2TV: A Unified Framework for Interleaved Language and Video Generation

본 논문은 복잡한 시맨틱 추론이나 반복적인 고수준 계획이 필요한 비디오 생성에서 기존 모델들이 겪는 한계를 극복하고자 합니다. 비디오 생성을 텍스트와 비디오 생성의 교차 프로세스로 분해함으로써 시각적 품질과 사용자 제어 가능성을 획기적으로 향상시키는 것을 목표로 합니다.

#Review #Video Generation #Language Modeling #Multimodal AI #Interleaved Generation #Flow Matching #Transformer #Controllability #World Models

2025년 12월 4일

[논문리뷰] Generating an Image From 1,000 Words: Enhancing Text-to-Image With Structured Captions

본 논문은 기존 텍스트-이미지(T2I) 모델의 낮은 제어 가능성과 표현력 부족 문제를 해결하는 것을 목표로 합니다. 짧은 텍스트 프롬프트와 풍부한 시각적 출력 사이의 불일치로 인해 모델이 세부 정보를 임의로 채우는 경향이 있으며, 이는 전문적인 사용에 필요한 정밀한 제어를 제한합니다.

#Review #Text-to-Image Generation #Structured Captions #LLM Fusion #Controllability #Image Generation Evaluation #Diffusion Models #DimFusion #TaBR

2025년 11월 10일

[논문리뷰] Do You Hear What I Mean? Quantifying the Instruction-Perception Gap in Instruction-Guided Expressive Text-To-Speech Systems

이 논문은 ITTS (Instruction-Guided Text-to-Speech) 시스템에서 사용자의 자연어 명령(natural language prompts)과 청취자의 음성 지각(listener perception) 간의 불일치를 정량적으로 분석하는 것을 목표로 합니다.

#Review #Instruction-Guided TTS #Expressive Speech Synthesis #Human Perception #Subjective Evaluation #Controllability #Instruction Following #Evaluation Metrics

2025년 9월 22일

[논문리뷰] Group Relative Attention Guidance for Image Editing

본 논문은 Diffusion-in-Transformer ( DiT ) 모델 기반 이미지 편집 방법론이 편집 강도 제어에 있어 효과적인 수단을 결여하고 있어 맞춤형 결과 도출에 한계가 있음을 지적합니다.

#Review #Image Editing #Diffusion Transformers #Attention Mechanism #Guidance Mechanism #Controllability #Fine-grained Control #GRAG

2025년 10월 29일

[논문리뷰] World-in-World: World Models in a Closed-Loop World

본 논문은 기존 세계 모델(World Models, WM) 평가 프로토콜이 시각적 품질에만 치중하여 실제 환경에 대한 embodied agent의 태스크 성공 여부 를 제대로 측정하지 못하는 문제를 해결하고자 합니다.

#Review #World Models #Embodied AI #Closed-Loop Evaluation #Online Planning #Data Scaling #Controllability #Robotic Manipulation

2025년 10월 22일