[논문리뷰] Prism-Δ: Differential Subspace Steering for Prompt Highlighting in Large Language ModelsarXiv에 게시된 'Prism-Δ: Differential Subspace Steering for Prompt Highlighting in Large Language Models' 논문에 대한 자세한 리뷰입니다.#Review#Prompt Highlighting#Large Language Models#Activation Steering#Differential SVD#Key-Value Channels#Cross-Covariance#Softplus Weighting#Inference-Time Intervention2026년 3월 11일댓글 수 로딩 중
[논문리뷰] Are Audio-Language Models Listening? Audio-Specialist Heads for Adaptive Audio SteeringarXiv에 게시된 'Are Audio-Language Models Listening? Audio-Specialist Heads for Adaptive Audio Steering' 논문에 대한 자세한 리뷰입니다.#Review#Audio-Language Models (LALMs)#Text Dominance#Mechanistic Interpretability#Attention Heads#Activation Steering#Multimodal Grounding#Inference-time Intervention2026년 3월 10일댓글 수 로딩 중
[논문리뷰] ASA: Training-Free Representation Engineering for Tool-Calling AgentsHongwei Zeng이 arXiv에 게시한 'ASA: Training-Free Representation Engineering for Tool-Calling Agents' 논문에 대한 자세한 리뷰입니다.#Review#Tool-Calling Agents#LLM Adaptation#Representation Engineering#Activation Steering#Training-Free#Inference-Time Control#Domain Adaptation2026년 2월 11일댓글 수 로딩 중
[논문리뷰] Linear representations in language models can change dramatically over a conversationarXiv에 게시된 'Linear representations in language models can change dramatically over a conversation' 논문에 대한 자세한 리뷰입니다.#Review#Language Models#Representation Analysis#Interpretability#In-Context Learning#Representation Dynamics#Factuality#Conversational AI#Activation Steering2026년 1월 28일댓글 수 로딩 중
[논문리뷰] Selective Steering: Norm-Preserving Control Through Discriminative Layer SelectionarXiv에 게시된 'Selective Steering: Norm-Preserving Control Through Discriminative Layer Selection' 논문에 대한 자세한 리뷰입니다.#Review#Activation Steering#Large Language Models (LLMs)#Norm Preservation#Discriminative Layer Selection#Behavior Control#Inference-time Intervention#Angular Steering2026년 1월 27일댓글 수 로딩 중
[논문리뷰] YaPO: Learnable Sparse Activation Steering Vectors for Domain AdaptationarXiv에 게시된 'YaPO: Learnable Sparse Activation Steering Vectors for Domain Adaptation' 논문에 대한 자세한 리뷰입니다.#Review#Large Language Models (LLMs)#Activation Steering#Sparse Autoencoders (SAEs)#Domain Adaptation#Cultural Alignment#Preference Optimization#Disentangled Representations#Fine-grained Control2026년 1월 19일댓글 수 로딩 중
[논문리뷰] The Assistant Axis: Situating and Stabilizing the Default Persona of Language ModelsJack Lindsey이 arXiv에 게시한 'The Assistant Axis: Situating and Stabilizing the Default Persona of Language Models' 논문에 대한 자세한 리뷰입니다.#Review#Language Models#Persona Control#Activation Steering#Persona Drift#Alignment#Post-training#Interpretability#Safety2026년 1월 19일댓글 수 로딩 중
[논문리뷰] Fantastic Reasoning Behaviors and Where to Find Them: Unsupervised Discovery of the Reasoning ProcessarXiv에 게시된 'Fantastic Reasoning Behaviors and Where to Find Them: Unsupervised Discovery of the Reasoning Process' 논문에 대한 자세한 리뷰입니다.#Review#LLM Reasoning#Mechanistic Interpretability#Sparse Autoencoders (SAEs)#Activation Steering#Unsupervised Learning#Reasoning Behaviors#Chain-of-Thought#Feature Disentanglement2025년 12월 31일댓글 수 로딩 중
[논문리뷰] Generalization or Memorization: Dynamic Decoding for Mode SteeringarXiv에 게시된 'Generalization or Memorization: Dynamic Decoding for Mode Steering' 논문에 대한 자세한 리뷰입니다.#Review#Large Language Models (LLMs)#Generalization#Memorization#Information Bottleneck (IB)#Activation Steering#Decoding Strategy#Causal Intervention#LLM Reliability2025년 10월 29일댓글 수 로딩 중
[논문리뷰] Persona Vectors: Monitoring and Controlling Character Traits in Language ModelsJack Lindsey이 arXiv에 게시한 'Persona Vectors: Monitoring and Controlling Character Traits in Language Models' 논문에 대한 자세한 리뷰입니다.#Review#Large Language Models (LLMs)#Persona Control#Activation Steering#Finetuning#Behavioral Shift Detection#Interpretability#Data Filtering2025년 8월 2일댓글 수 로딩 중