#Sparse Autoencoder

3개의 포스트

[논문리뷰] Guiding LLM Post-training Data Engineering with Model Internals from Sparse Autoencoders

본 논문은 LLM post-training에서 데이터 엔지니어링이 모델 성능 향상의 핵심임에도 불구하고, 기존 방식들은 주로 외부 피드백(인간 선호도, 보상 모델, rollout 결과 등)에 의존하여 비용이 높고 효율성이 제한적이라는 문제에서 출발한다.

#Review #Sparse Autoencoder #LLM Post-training #Reinforcement Learning #Data Engineering #Mechanistic Interpretability #Curriculum Learning #Data Selection

2026년 5월 27일

[논문리뷰] SEM: Sparse Embedding Modulation for Post-Hoc Debiasing of Vision-Language Models

CLIP과 같은 Vision-Language Models (VLMs)는 multimodal AI의 핵심 구성 요소이지만, 대규모의 uncurated training data로 인해 심각한 social 및 spurious bias가 내재되어 있다.

#Review #Vision-Language Models #CLIP #Debiasing #Sparse Autoencoder #Post-Hoc #Zero-Shot #Feature Disentanglement #Bias Mitigation

2026년 3월 23일

[논문리뷰] Thought Communication in Multiagent Collaboration

본 논문은 대규모 언어 모델(LLM) 기반 멀티 에이전트 시스템(MAS)에서 자연어 통신의 내재적 한계(손실, 모호성)를 극복하고자 합니다.

#Review #Multiagent Systems #LLM Communication #Latent Variable Models #Identifiability Theory #Thought Communication #Sparse Autoencoder #Prefix Tuning

2025년 10월 24일