#Causal Attention

5개의 포스트

[논문리뷰] Causal Forcing: Autoregressive Diffusion Distillation Done Right for High-Quality Real-Time Interactive Video Generation

실시간 상호작용 비디오 생성을 위해 기존의 양방향(bidirectional) 비디오 확산 모델을 소수 스텝의 자기회귀(autoregressive, AR) 모델로 증류하는 과정에서 발생하는 아키텍처 간극(architectural gap) 과 프레임 수준의 단사성(frame-level injectivity) 위반 문제 를 해결하고자 합니다.

#Review #Autoregressive Video Generation #Diffusion Models #Model Distillation #Real-Time AI #Causal Attention #ODE Distillation #Frame-level Injectivity #Teacher Forcing

2026년 2월 2일

[논문리뷰] Lost in the Prompt Order: Revealing the Limitations of Causal Attention in Language Models

본 논문은 대규모 언어 모델(LLM)이 프롬프트 구조에 민감하게 반응하는 이유를 밝히고, 특히 다중 선택 질의응답(MCQA) 태스크에서 컨텍스트의 순서가 성능에 미치는 영향을 분석하는 것을 목표로 합니다.

#Review #Prompt Engineering #Large Language Models #Causal Attention #Multiple-Choice QA #Prompt Order Sensitivity #Information Bottleneck #Decoder-only Transformers

2026년 1월 21일

[논문리뷰] KV-Embedding: Training-free Text Embedding via Internal KV Re-routing in Decoder-only LLMs

디코더 전용 LLM을 학습 없이 텍스트 임베딩 백본으로 활용할 때 발생하는 두 가지 구조적 문제(인과적 어텐션으로 인한 정보 비대칭, 다음 토큰 예측 목표로 인한 의미 압축 편향)를 해결하여, 고품질의 텍스트 임베딩을 효율적으로 추출하는 것입니다.

#Review #Text Embedding #Decoder-only LLMs #Training-free #KV Re-routing #Causal Attention #Representation Learning #Intrinsic Dimensionality

2026년 1월 5일

[논문리뷰] Causal Attention with Lookahead Keys

이 연구는 자기회귀(autoregressive) 언어 모델 의 핵심 구성 요소인 표준 인과적 어텐션(causal attention)이 이전 문맥에만 의존하여 전역적 문맥 파악과 자연어 이해 능력을 저해하는 문제를 해결하는 것을 목표로 합니다.

#Review #Causal Attention #Lookahead Keys #Autoregressive Modeling #Language Models #Transformer #Perplexity Reduction #Parallel Training #Efficient Inference

2025년 9월 10일

[논문리뷰] Sparser Block-Sparse Attention via Token Permutation

본 논문은 LLM에서 긴 컨텍스트 길이 처리 시 O(N^2) 복잡도 를 가진 self-attention 메커니즘 으로 인한 막대한 계산 비용과 메모리 병목 현상을 해결하고자 합니다.

#Review #Large Language Models (LLMs)#Self-Attention #Block-Sparse Attention #Token Permutation #Computational Efficiency #Prefilling #Long Context #Causal Attention

2025년 10월 27일