#Logit Lens

3개의 포스트

[논문리뷰] Spurious Rewards Paradox: Mechanistically Understanding How RLVR Activates Memorization Shortcuts in LLMs

Lecheng Yan이 arXiv에 게시한 'Spurious Rewards Paradox: Mechanistically Understanding How RLVR Activates Memorization Shortcuts in LLMs' 논문에 대한 자세한 리뷰입니다.

#Review #RLVR #LLMs #Mechanistic Interpretability #Memorization Shortcuts #Data Contamination #Anchor-Adapter Circuit #Path Patching #Logit Lens

2026년 1월 19일

[논문리뷰] Map the Flow: Revealing Hidden Pathways of Information in VideoLLMs

Bohyung Han이 arXiv에 게시한 'Map the Flow: Revealing Hidden Pathways of Information in VideoLLMs' 논문에 대한 자세한 리뷰입니다.

#Review #Video Large Language Models #VideoQA #Mechanistic Interpretability #Attention Knockout #Temporal Reasoning #Information Flow #Model Interpretability #Logit Lens

2025년 10월 27일

[논문리뷰] Beyond Transcription: Mechanistic Interpretability in ASR

Aviv Shamsian이 arXiv에 게시한 'Beyond Transcription: Mechanistic Interpretability in ASR' 논문에 대한 자세한 리뷰입니다.

#Review #ASR #Mechanistic Interpretability #Logit Lens #Linear Probing #Activation Patching #Hallucinations #Repetitions #Encoder-Decoder

2025년 8월 28일