#Orthogonality

1개의 포스트

[논문리뷰] OrtSAE: Orthogonal Sparse Autoencoders Uncover Atomic Features

본 논문은 기존 Sparse Autoencoders (SAEs)가 겪는 피쳐 흡수(feature absorption) 및 피쳐 구성(feature composition) 문제를 해결하여, LLM 내부 활성화에서 추출되는 피쳐의 해석 가능성과 원자성을 높이는 것을 목표로 합니다.

#Review #Sparse Autoencoders #Mechanistic Interpretability #Feature Disentanglement #Orthogonality #LLM Features #Feature Absorption #Feature Composition

2025년 10월 6일