#LLM Steering

3개의 포스트

[논문리뷰] UniSteer: Text-Guided Flow Matching in Activation Space for Versatile LLM Steering

본 논문은 LLM의 행동 제어를 위한 기존 Activation Steering 방법론들이 가진 확장성 및 구성적 제약 문제를 해결하기 위해 UniSteer를 제안합니다.

#Review #LLM Steering #Activation Space #Flow Matching #Text-Guided Control #Activation Inversion #Multi-Constraint #Zero-shot Classification

2026년 5월 28일

[논문리뷰] Brain-Grounded Axes for Reading and Steering LLM States

본 연구는 LLM(대규모 언어 모델)의 해석 가능성 방향이 종종 외부 접지(external grounding)가 부족하다는 문제에 주목합니다. 이를 해결하기 위해 인간의 뇌 활동을 LLM의 내부 상태를 해석하고 조종하기 위한 안정적이고 외부적으로 접지된 좌표계로 정의하는 것을 목표로 합니다.

#Review #LLM Interpretability #Brain-Grounded AI #MEG #Phase-Locking Value #ICA #LLM Steering #Neural Decoding #Latent Space

2025년 12월 22일

[논문리뷰] CorrSteer: Steering Improves Task Performance and Safety in LLMs through Correlation-based Sparse Autoencoder Feature Selection

본 논문은 기존의 Sparse Autoencoder (SAE) 기반 LLM 조향 방식이 요구하는 대규모 대조 데이터셋 또는 방대한 활성화 저장 공간 의 한계를 해결하고자 합니다.

#Review #Sparse Autoencoders #LLM Steering #Feature Selection #Correlation Analysis #AI Safety #Bias Mitigation #Mechanistic Interpretability

2025년 8월 20일