#Interpretability

26개의 포스트

[논문리뷰] ETCHR: Editing To Clarify and Harness Reasoning

ETCHR은 LLM의 CoT 생성 과정에 존재하는 논리적 결함과 불필요한 노이즈가 최종 성능을 저하시키는 문제를 해결하기 위해 고안되었습니다. 기존 LLM은 긴 Reasoning Path를 생성할 때 고수준의 논리적 일관성을 유지하는 데 한계를 보이며, 이는 결과적으로 정답률 감소로 이어집니다.

#Review #Chain-of-Thought #Reasoning #Model Editing #Inference Optimization #LLM #Knowledge Distillation #Interpretability

2026년 5월 24일

[논문리뷰] Monitoring the Internal Monologue: Probe Trajectories Reveal Reasoning Dynamics

본 논문은 LRM에서 생성되는 Chain of Thought(CoT)가 모델의 최종 출력과 항상 일치하지 않는다는 'Unfaithfulness' 문제를 해결하고자 합니다 .

#Review #Large Reasoning Models #Chain of Thought #Probe Trajectories #Representation Engineering #AI Safety #Max-pooling #Interpretability

2026년 5월 18일

[논문리뷰] Sparse Autoencoders enable Robust and Interpretable Fine-tuning of CLIP models

본 논문은 CLIP과 같은 대규모 vision-language 모델을 하위 태스크(downstream task)에 맞게 fine-tuning할 때 발생하는 OOD(Out-of-Distribution) 성능 저하 문제를 해결하고자 한다.

#Review #CLIP #Sparse Autoencoders #Robust Fine-tuning #Interpretability #Representational Drift #Computer Vision

2026년 5월 17일

[논문리뷰] Target-Oriented Pretraining Data Selection via Neuron-Activated Graph

본 논문은 LLM pretraining 과정에서 타겟 도메인 및 태스크의 특성을 효율적으로 학습하기 위한 정교한 데이터 선별 기법의 부재 문제를 해결합니다.

#Review #Large Language Models #Pretraining Data Selection #Neuron-Activated Graph #Target-Oriented Pretraining #Interpretability

2026년 4월 21일

[논문리뷰] Visual Persuasion: What Influences Decisions of Vision-Language Models?

본 연구는 Vision-Language Model (VLM) 이 시각적 요인에 의해 의사결정에 어떻게 영향을 받는지 체계적으로 이해하는 것을 목표로 합니다.

#Review #Vision-Language Models #Visual Persuasion #Prompt Optimization #Image Generation #AI Agent Behavior #Interpretability #Behavioral Evaluation

2026년 2월 17일

[논문리뷰] Sanity Checks for Sparse Autoencoders: Do SAEs Beat Random Baselines?

본 논문은 Sparse Autoencoders (SAEs)가 신경망의 활성화를 해석 가능한 희소 특징으로 분해하는 데 있어 실제로 의미 있는 특징을 학습하는지 여부를 체계적으로 평가하는 것을 목표로 합니다.

#Review #Sparse Autoencoders #Interpretability #Neural Network Internals #Evaluation Baselines #Feature Decomposition #LLMs #Mechanistic Interpretability

2026년 2월 17일

[논문리뷰] Latent Chain-of-Thought as Planning: Decoupling Reasoning from Verbalization

논문은 LLM의 CoT(Chain-of-Thought) 추론 이 가진 높은 연산 비용과 이산 토큰 샘플링으로 인한 추론 경로 붕괴 문제를 해결하고자 합니다.

#Review #Latent Reasoning #Chain-of-Thought (CoT)#Large Language Models (LLMs)#Planning #Reinforcement Learning #Mathematical Reasoning #Decoupling #Interpretability

2026년 2월 1일

[논문리뷰] Linear representations in language models can change dramatically over a conversation

본 연구는 대규모 언어 모델(LLM) 내에서 선형 표현(Linear representations) , 특히 사실성(factuality)이나 윤리(ethics)와 같은 고수준 개념을 나타내는 표현이 대화 과정에서 어떻게 동적으로 변화 하는지 조사하는 것을 목표로 합니다.

#Review #Language Models #Representation Analysis #Interpretability #In-Context Learning #Representation Dynamics #Factuality #Conversational AI #Activation Steering

2026년 1월 28일

[논문리뷰] The Assistant Axis: Situating and Stabilizing the Default Persona of Language Models

본 논문은 대규모 언어 모델(LLM)이 기본적으로 가지는 'AI Assistant' 페르소나의 구조를 심층적으로 탐구하고, 이 페르소나가 특정 상황에서 벗어나 부적절하거나 유해한 행동으로 이어지는 '페르소나 드리프트' 현상을 해결하는 것을 목표로 합니다.

#Review #Language Models #Persona Control #Activation Steering #Persona Drift #Alignment #Post-training #Interpretability #Safety

2026년 1월 19일

[논문리뷰] The Illusion of Specialization: Unveiling the Domain-Invariant 'Standing Committee' in Mixture-of-Experts Models

본 연구는 MoE(Mixture-of-Experts) 모델 이 희소 라우팅을 통해 도메인 특화(domain specialization)를 달성한다는 일반적인 가정에 의문을 제기합니다.

#Review #Mixture-of-Experts (MoE)#Sparse Routing #Domain Specialization #Load Balancing #Interpretability #Standing Committee #LLM

2026년 1월 8일

[논문리뷰] Bottom-up Policy Optimization: Your Language Model Policy Secretly Contains Internal Policies

본 논문은 기존 RL 접근 방식이 LLM을 단일 블랙박스 정책으로 취급하는 한계를 극복하고자 합니다.

#Review #Reinforcement Learning #Large Language Models #Policy Optimization #Interpretability #Transformer #Internal Policy #Entropy Analysis

2025년 12월 23일

[논문리뷰] Robust-R1: Degradation-Aware Reasoning for Robust Visual Understanding

본 논문은 Multimodal Large Language Models (MLLMs)가 실제 환경의 극심한 시각적 열화(visual degradations) 조건에서 성능이 크게 저하되는 문제를 해결하고자 합니다.

#Review #Multimodal Large Language Models (MLLMs)#Visual Degradation #Robustness #Reasoning Chains #Supervised Fine-Tuning (SFT)#Reinforcement Learning (RL)#Degradation-Aware Reasoning #Interpretability

2025년 12월 21일

[논문리뷰] BrainExplore: Large-Scale Discovery of Interpretable Visual Representations in the Human Brain

본 논문은 인간 뇌에서 시각적 개념 표현을 대규모로 발견하고 해석하는 자동화된 프레임워크인 BrainExplore 를 제안합니다. 기존 fMRI 연구의 소규모, 수동 분석 및 특정 영역 의존성의 한계를 극복하고, 방대한 시각적 개념 공간에서 정교하고 해석 가능한 뇌 활동 패턴 을 자동으로 식별하는 것을 목표로 합니다.

#Review #fMRI #Brain Mapping #Visual Representation #Interpretability #Sparse Autoencoders #Vision-Language Models #Unsupervised Learning #Neuroscience

2025년 12월 10일

[논문리뷰] Extracting Interaction-Aware Monosemantic Concepts in Recommender Systems

본 논문은 현대 추천 시스템의 잠재 임베딩이 의미론적으로 불투명하여 해석 가능성이 낮고 제어가 어렵다는 문제를 해결하고자 합니다.

#Review #Recommender Systems #Sparse Autoencoder (SAE)#Monosemantic Neurons #Interpretability #Prediction-Aware Loss #User-Item Interactions #Post-hoc Control

2025년 11월 24일

[논문리뷰] Where MLLMs Attend and What They Rely On: Explaining Autoregressive Token Generation

Multimodal Large Language Models (MLLMs)의 자동 회귀 토큰 생성 과정에서 시각적 입력이 출력 토큰에 미치는 영향을 설명하고, 언어적 선험 지식과 지각적 증거의 상대적 영향력을 정량화하는 것을 목표로 합니다.

#Review #MLLM #Interpretability #Attribution #Token Generation #Black-box Explanation #Hallucination Diagnosis #Multimodality #VQA

2025년 9월 29일

[논문리뷰] SIM-CoT: Supervised Implicit Chain-of-Thought

Implicit Chain-of-Thought (CoT) 모델은 토큰 효율성에도 불구하고, 명시적 CoT 대비 지속적인 성능 격차와 핵심적인 '잠재 불안정성(latent instability)' 문제에 직면해 있습니다.

#Review #Implicit Reasoning #Chain-of-Thought #LLM #Latent Space #Supervised Learning #Model Stability #Interpretability

2025년 9월 25일

[논문리뷰] From Scores to Skills: A Cognitive Diagnosis Framework for Evaluating Financial Large Language Models

기존 금융 LLM 벤치마크의 단일 점수 평가 방식(score flattening) 과 불균형한 개념 커버리지(coverage imbalance) 로 인해 모델의 실제 지식 수준과 한계를 파악하기 어렵다는 문제를 해결하고자 합니다.

#Review #Financial LLMs #Cognitive Diagnosis Model #LLM Evaluation #Knowledge Assessment #Matrix Factorization #CPA-QKA #Interpretability

2025년 8월 21일

[논문리뷰] ZARA: Zero-shot Motion Time-Series Analysis via Knowledge and Retrieval Driven LLM Agents

본 논문은 기존 HAR(Human Activity Recognition) 시스템의 낮은 일반화 능력 , 제한적인 제로샷 기능 , 해석 불가능성 이라는 세 가지 주요 한계를 해결하고자 합니다.

#Review #Zero-shot HAR #LLM Agents #Time-Series Analysis #Knowledge Base #Retrieval-Augmented Generation #Multi-sensor Fusion #Interpretability

2025년 8월 20일

[논문리뷰] X-Node: Self-Explanation is All We Need

그래프 신경망(GNN)의 불투명한 의사결정 문제를 해결하고, 특히 신뢰성이 필수적인 고위험 임상 환경에서 개별 노드 수준의 충실한 자체 설명(self-explanation) 을 제공하는 것을 목표로 합니다.

#Review #Graph Neural Networks #Explainable AI #Self-Explanation #Node Classification #Medical Imaging #Natural Language Processing #Interpretability

2025년 8월 18일

[논문리뷰] RPCANet++: Deep Interpretable Robust PCA for Sparse Object Segmentation

본 논문은 기존의 Robust PCA (RPCA) 모델이 가진 높은 계산 비용, 수동 튜닝에 따른 일반화 능력 부족, 그리고 경직된 사전 지식으로 인한 한계를 극복하는 것을 목표로 합니다.

#Review #Robust PCA #Deep Unfolding #Sparse Segmentation #Interpretability #Image Decomposition #Computer Vision

2025년 8월 8일

[논문리뷰] Persona Vectors: Monitoring and Controlling Character Traits in Language Models

이 논문은 대규모 언어 모델(LLMs)에서 발생하는 예상치 못한 또는 바람직하지 않은 페르소나 변화 문제를 해결하는 것을 목표로 합니다.

#Review #Large Language Models (LLMs)#Persona Control #Activation Steering #Finetuning #Behavioral Shift Detection #Interpretability #Data Filtering

2025년 8월 2일

[논문리뷰] CityRiSE: Reasoning Urban Socio-Economic Status in Vision-Language Models via Reinforcement Learning

본 논문은 대규모 시각-언어 모델(LVLM)이 시각 데이터를 통해 도시의 사회경제적 지위를 정확하고 해석 가능하게 예측하는 데 어려움을 겪는 문제를 해결하는 것을 목표로 합니다. 특히, 학습 시 접하지 못한 도시나 지표에 대한 일반화 성능 을 향상시키고, 동시에 설명 가능한 추론 과정 을 제공하고자 합니다.

#Review #Urban Sensing #Socio-Economic Status #Vision-Language Models #Reinforcement Learning #Generalization #Interpretability #Multi-modal Data

2025년 10월 31일

[논문리뷰] Latent Sketchpad: Sketching Visual Thoughts to Elicit Multimodal Reasoning in MLLMs

Multimodal Large Language Models (MLLMs)가 복잡한 시각적 계획과 상상력을 요구하는 시나리오에서 겪는 어려움을 해결하고, MLLM에 내부 시각적 스크래치패드(visual scratchpad) 를 부여하여 시각적 사고(visual thought) 를 통해 멀티모달 추론 능력을 향상시키는 것을 목표로 합니다.

#Review #Multimodal LLMs #Visual Reasoning #Latent Space #Sketch Generation #Visual Thinking #Autoregressive Generation #Interpretability

2025년 10월 29일

[논문리뷰] Attention Illuminates LLM Reasoning: The Preplan-and-Anchor Rhythm Enables Fine-Grained Policy Optimization

본 논문은 LLM의 불투명한 추론 과정을 명확히 이해하고, 기존 RL의 균일한 크레딧 할당 방식이 중요한 추론 단계를 모호하게 만드는 문제를 해결하는 것을 목표로 합니다.

#Review #LLM Reasoning #Attention Mechanisms #Reinforcement Learning #Credit Assignment #Policy Optimization #Interpretability #Preplan-and-Anchor Rhythm #Generative Models

2025년 10월 16일

[논문리뷰] ReFIne: A Framework for Trustworthy Large Reasoning Models with Reliability, Faithfulness, and Interpretability

논문은 기존 Long Chain-of-Thought (CoT) 추론 모델 들이 답변 정확도와 토큰 효율성에만 집중하여 신뢰성(trustworthiness) 을 간과하는 문제를 해결하고자 합니다.

#Review #Trustworthy AI #Large Reasoning Models (LRMs)#Interpretability #Faithfulness #Reliability #Chain-of-Thought (CoT)#Supervised Fine-tuning (SFT)#GRPO

2025년 10월 15일

[논문리뷰] Who's Your Judge? On the Detectability of LLM-Generated Judgments

본 논문은 LLM이 생성한 평가(judgment)를 인간의 평가와 구별하는 판단 탐지(judgment detection) 태스크를 제안하고, 그 탐지 가능성을 체계적으로 조사하는 것을 목표로 합니다.

#Review #LLM-as-a-judge #Judgment Detection #Bias Quantification #Feature Engineering #Interpretability #Peer Review #AI Ethics #Evaluation

2025년 10월 1일