#LLM Interpretability

4개의 포스트

[논문리뷰] Contrastive Attribution in the Wild: An Interpretability Analysis of LLM Failures on Realistic Benchmarks

arXiv에 게시된 'Contrastive Attribution in the Wild: An Interpretability Analysis of LLM Failures on Realistic Benchmarks' 논문에 대한 자세한 리뷰입니다.

#Review #LLM Interpretability #Contrastive Attribution #Layer-wise Relevance Propagation #Attribution Graph #Failure Analysis #Transformer

2026년 4월 21일

[논문리뷰] Brain-Grounded Axes for Reading and Steering LLM States

Sandro Andric이 arXiv에 게시한 'Brain-Grounded Axes for Reading and Steering LLM States' 논문에 대한 자세한 리뷰입니다.

#Review #LLM Interpretability #Brain-Grounded AI #MEG #Phase-Locking Value #ICA #LLM Steering #Neural Decoding #Latent Space

2025년 12월 22일

[논문리뷰] Memory Retrieval and Consolidation in Large Language Models through Function Tokens

arXiv에 게시된 'Memory Retrieval and Consolidation in Large Language Models through Function Tokens' 논문에 대한 자세한 리뷰입니다.

#Review #Large Language Models #LLM Interpretability #Function Tokens #Memory Retrieval #Memory Consolidation #Sparse Autoencoders #Pre-training

2025년 10월 10일

[논문리뷰] Hyperdimensional Probe: Decoding LLM Representations via Vector Symbolic Architectures

Andrea Passerini이 arXiv에 게시한 'Hyperdimensional Probe: Decoding LLM Representations via Vector Symbolic Architectures' 논문에 대한 자세한 리뷰입니다.

#Review #LLM Interpretability #Vector Symbolic Architectures #Neural Probing #Information Decoding #Hyperdimensional Computing #Latent Representations

2025년 10월 2일