#Truth Co-occurrence Hypothesis

1개의 포스트

[논문리뷰] Emergence of Linear Truth Encodings in Language Models

언어 모델(LM)에서 참/거짓 진술을 선형적으로 구분하는 '진실 부공간'이 왜, 그리고 어떻게 출현하는지 그 기계론적 원리 를 밝히는 것이 주요 목표입니다. 이는 LM의 환각 현상(hallucinations) 완화 에 기여할 수 있는 근본적인 이해를 제공하고자 합니다.

#Review #Language Models #Truth Encoding #Linear Subspaces #Mechanistic Interpretability #Transformer Models #Learning Dynamics #Truth Co-occurrence Hypothesis #Hallucinations

2025년 10월 24일