#Self-Attention

8개의 포스트

[논문리뷰] VISion On Request: Enhanced VLLM efficiency with sparse, dynamically selected, vision-language interactions

기존의 Large Vision-Language Models (LVLMs) 효율성 개선 접근 방식은 주로 visual token reduction에 기반한다.

#Review #LVLM Efficiency #Sparse Interaction #Cross-Attention #Self-Attention #Adaptive Inference #Visual Feature Refinement #Computational Cost Reduction

2026년 3월 24일

[논문리뷰] Beyond Real: Imaginary Extension of Rotary Position Embeddings for Long-Context LLMs

현재 RoPE(Rotary Position Embeddings) 구현이 어텐션 스코어 계산 시 복소수 값의 내적에서 실수부만 사용 하고 허수부를 버려, 장문맥 의존성 모델링에 중요한 관계형 정보 손실 이 발생하는 문제를 해결하고자 합니다.

#Review #Rotary Position Embedding #Long-Context LLMs #Complex-Valued Neural Networks #Self-Attention #Positional Encoding #Information Loss #Length Extrapolation

2025년 12월 8일

[논문리뷰] Toward the Frontiers of Reliable Diffusion Sampling via Adversarial Sinkhorn Attention Guidance

이 논문은 확산 모델의 샘플링 과정에서 발생하는 품질 및 제어 가능성 문제를 해결하고자 합니다.

#Review #Diffusion Models #Guidance Sampling #Optimal Transport #Sinkhorn Algorithm #Self-Attention #Adversarial Perturbation #Image Generation #ControlNet

2025년 11월 12일

[논문리뷰] Stand-In: A Lightweight and Plug-and-Play Identity Control for Video Generation

이 논문은 비디오 생성에서 사용자가 지정한 정체성을 고품질로 일관되게 유지하면서도, 기존 방법론의 과도한 훈련 파라미터 및 다른 AI 생성 모델과의 호환성 부족 문제를 해결하는 것을 목표로 합니다. 특히, 경량의 플러그-앤-플레이 프레임워크를 통해 실용적인 정체성 제어 솔루션을 제시하고자 합니다.

#Review #Video Generation #Identity Preservation #Plug-and-Play #Diffusion Models #Self-Attention #Lightweight AI #Conditional Image Branch

2025년 8월 14일

[논문리뷰] ACG: Action Coherence Guidance for Flow-based VLA models

본 논문은 모방 학습을 통해 훈련된 Vision-Language-Action (VLA) 모델, 특히 Diffusion 및 Flow Matching 모델 에서 발생하는 액션 불일치(jerks, pauses, jitter) 문제를 해결하여 안정성과 궤적 드리프트로 인한 정밀 조작 실패를 방지하는 것을 목표로 합니다.

#Review #Action Coherence #Flow Matching #VLA Models #Guidance #Robotics #Imitation Learning #Transformer #Self-Attention

2025년 10월 28일

[논문리뷰] Sparser Block-Sparse Attention via Token Permutation

본 논문은 LLM에서 긴 컨텍스트 길이 처리 시 O(N^2) 복잡도 를 가진 self-attention 메커니즘 으로 인한 막대한 계산 비용과 메모리 병목 현상을 해결하고자 합니다.

#Review #Large Language Models (LLMs)#Self-Attention #Block-Sparse Attention #Token Permutation #Computational Efficiency #Prefilling #Long Context #Causal Attention

2025년 10월 27일

[논문리뷰] HoloCine: Holistic Generation of Cinematic Multi-Shot Long Video Narratives

현재 텍스트-투-비디오(T2V) 모델들이 단일 클립 생성에는 뛰어나지만, 스토리텔링의 본질인 다중 샷(multi-shot) 내러티브 를 일관성 있게 생성하는 데 실패하는 '내러티브 격차'를 해소하는 것을 목표로 합니다.

#Review #Text-to-Video Generation #Multi-Shot Video #Narrative Coherence #Diffusion Models #Self-Attention #Cinematic AI #Video Consistency #Directorial Control

2025년 10월 24일

[논문리뷰] DeLeaker: Dynamic Inference-Time Reweighting For Semantic Leakage Mitigation in Text-to-Image Models

본 논문은 Text-to-Image (T2I) 모델에서 발생하는 의도치 않은 의미적 누출(semantic leakage) 문제를 해결하는 것을 목표로 합니다. 이는 서로 다른 개체 간에 의미론적으로 관련된 특징이 잘못 전달되는 현상으로, 기존 방법론의 최적화 비용 및 외부 입력 의존성 문제를 극복하고자 합니다.

#Review #Semantic Leakage #Text-to-Image Models #Attention Control #Inference-time Mitigation #Diffusion Models #Evaluation Dataset #Self-Attention

2025년 10월 23일