최신 포스트

[논문리뷰] YaPO: Learnable Sparse Activation Steering Vectors for Domain Adaptation

본 논문은 LLM의 행동을 미세하게 제어하는 데 있어 기존의 Dense Steering Vector 방식이 지닌 Latent Factor 얽힘(Entanglement) 문제와 불안정성을 해결하는 것을 목표로 합니다.

#Review #Large Language Models (LLMs)#Activation Steering #Sparse Autoencoders (SAEs)#Domain Adaptation #Cultural Alignment #Preference Optimization #Disentangled Representations #Fine-grained Control

2026년 1월 19일

[논문리뷰] The Assistant Axis: Situating and Stabilizing the Default Persona of Language Models

본 논문은 대규모 언어 모델(LLM)이 기본적으로 가지는 'AI Assistant' 페르소나의 구조를 심층적으로 탐구하고, 이 페르소나가 특정 상황에서 벗어나 부적절하거나 유해한 행동으로 이어지는 '페르소나 드리프트' 현상을 해결하는 것을 목표로 합니다.

#Review #Language Models #Persona Control #Activation Steering #Persona Drift #Alignment #Post-training #Interpretability #Safety

2026년 1월 19일

[논문리뷰] Spurious Rewards Paradox: Mechanistically Understanding How RLVR Activates Memorization Shortcuts in LLMs

본 논문은 RLVR(Reinforcement Learning with Verifiable Rewards) 로 튜닝된 LLM 이 때로는 불량한(spurious) 보상 에도 불구하고 성능 향상을 보이는 'Spurious Rewards Paradox' 현상을 기계론적으로 이해하는 것을 목표로 합니다.

#Review #RLVR #LLMs #Mechanistic Interpretability #Memorization Shortcuts #Data Contamination #Anchor-Adapter Circuit #Path Patching #Logit Lens

2026년 1월 19일

[논문리뷰] SIN-Bench: Tracing Native Evidence Chains in Long-Context Multimodal Scientific Interleaved Literature

기존 대규모 다중모달 언어 모델(MLLM) 평가 방식은 긴 과학 논문에서 심층적인 이해와 인과 관계를 파악하는 증거 기반 추론 능력 을 제대로 측정하지 못하고, 종종 표면적인 검색이나 파라미터 지식에만 의존하는 한계를 보였습니다.

#Review #Long-Context Understanding #Multimodal AI #Scientific Literature #Evidence-based Reasoning #MLLM Evaluation #Benchmarking #Cross-modal Reasoning #Information Synthesis

2026년 1월 19일

[논문리뷰] Multiplex Thinking: Reasoning via Token-wise Branch-and-Merge

대규모 언어 모델(LLM)의 Chain-of-Thought (CoT) 추론이 길고 저대역폭의 이산 토큰 시퀀스를 생성하는 문제점을 해결하고, 인간처럼 여러 가능한 다음 단계에 대한 분포를 유지하며 추론하는 확률적이고 샘플링 기반의 연속적 추론 메커니즘 을 개발하는 것을 목표로 합니다.

#Review #Large Language Models #Reasoning #Chain-of-Thought #Reinforcement Learning #Stochastic Reasoning #Continuous Representation #Token Efficiency

2026년 1월 19일

[논문리뷰] Medical SAM3: A Foundation Model for Universal Prompt-Driven Medical Image Segmentation

본 논문은 일반 자연 이미지에 대해 강력한 성능을 보인 SAM3 와 같은 프롬프트 기반 분할 파운데이션 모델이 심각한 도메인 시프트, 특권적인 공간 프롬프트의 부재, 복잡한 해부학적 및 체적 구조 추론의 필요성으로 인해 의료 영상 분할에 직접 적용하기 어려운 문제를 해결하는 것을 목표로 합니다.

#Review #Medical Image Segmentation #Foundation Models #SAM3 #Fine-tuning #Prompt-driven #Domain Adaptation #Text-guided Segmentation

2026년 1월 19일

[논문리뷰] CoDance: An Unbind-Rebind Paradigm for Robust Multi-Subject Animation

본 논문은 기존 단일 인물 애니메이션 방법론이 다중 인물, 다양한 캐릭터 유형, 그리고 레퍼런스 이미지와 드라이빙 포즈 간의 공간적 불일치(spatial misalignment) 문제를 해결하지 못하는 한계를 지적합니다.

#Review #Multi-subject Animation #Pose-driven Animation #Diffusion Models #Spatial Misalignment #Unbind-Rebind Paradigm #Character Animation #Video Generation

2026년 1월 19일

[논문리뷰] CLARE: Continual Learning for Vision-Language-Action Models via Autonomous Adapter Routing and Expansion

본 논문은 로봇이 실제 환경에서 새로운 작업을 지속적으로 학습하면서도 기존 지식을 잊지 않는 catastrophic forgetting 문제 를 해결하고, 과거 데이터 저장 및 작업 식별자 없이 작동하는 exemplar-free continual learning 을 Vision-Language-Action (VLA) 모델 에 적용하는 것을 목표로 합니다.

#Review #Continual Learning #Vision-Language-Action Models #Adapter Learning #Catastrophic Forgetting #Autonomous Routing #Parameter-Efficient Learning #Robotics

2026년 1월 19일

[논문리뷰] ABC-Bench: Benchmarking Agentic Backend Coding in Real-World Development

기존 코드 생성 벤치마크들이 정적인 코드 로직 평가에 집중하여 실제 백엔드 개발의 동적이고 전체 프로세스 요구사항(환경 구성, 서비스 배포 등)을 간과하는 문제를 해결하고자 합니다.

#Review #Backend Development #LLM Agents #Code Generation #Benchmarking #DevOps #Containerization #End-to-End Testing #Environment Configuration

2026년 1월 19일

[Loki] Bitmap 디코더 최적화: 처리량 93.5% 개선

Loki dataobj의 bitmap 디코더를 boolean 전용으로 특수화하고 memory.Bitmap으로 전환하여 처리량을 93.5% 개선한 PR 분석.

#Grafana Loki #Go #Bitmap #Decoder #Performance #Data Object

2026년 1월 19일

[vllm] Draft Model 기반 Speculative Decoding 지원

별도의 소형 draft 모델을 활용한 speculative decoding을 vLLM V1 엔진에 공식 통합

#vllm #Performance

2026년 1월 19일

[uvloop] _ready_len 레이스 컨디션 수정

수동 관리하던 _ready_len 카운터를 len(self._ready) 직접 호출로 대체하여 race condition을 제거한다

#uvloop #Race Condition #Event Loop #Cython

2026년 1월 19일

[llm-compressor] Memoryless Observers - 메모리 효율적 가중치 관찰자

양자화 캘리브레이션의 가중치 관찰자를 memoryless 방식으로 전환하여 메모리 사용량 대폭 감소

#llm-compressor #Performance

2026년 1월 19일

[Triton] M=64 2CTA 모드 지원 추가

Blackwell 아키텍처에서 M=64 instruction shape의 2CTA 모드를 지원하여 TensorMemory 레이아웃 유연성 확대

#Triton #NVIDIA #Blackwell #CTA #TensorMemory

2026년 1월 18일

[논문리뷰] When Personalization Misleads: Understanding and Mitigating Hallucinations in Personalized LLMs

개인화된 대규모 언어 모델(LLM)이 사용자 만족도를 높이는 동시에 사실적 추론을 왜곡 하여 개인화 유도 환각(personalization-induced hallucinations) 을 발생시키는 현상을 이해하고 해결하는 것이 목표입니다.

#Review #Personalized LLMs #Hallucination Mitigation #Factual Reasoning #Representation Entanglement #Inference-time Steering #Question Answering #Factuality Preservation

2026년 1월 18일

[논문리뷰] Reasoning Models Generate Societies of Thought

본 논문은 대규모 언어 모델(LLM)의 정교한 추론 능력 이면에 있는 메커니즘을 규명하고, 이러한 능력이 단순히 계산량 증가가 아닌, 복잡한 다중 에이전트 상호작용 인 '생각의 사회(society of thought)'를 내재적으로 시뮬레이션함으로써 발현된다는 가설을 제시합니다.

#Review #Reasoning Models #Large Language Models (LLMs)#Multi-Agent Systems #Society of Thought #Mechanistic Interpretability #Reinforcement Learning #Cognitive Diversity #Conversational AI

2026년 1월 18일

[논문리뷰] More Images, More Problems? A Controlled Analysis of VLM Failure Modes

본 논문은 최신 대규모 시각 언어 모델(LVLM) 이 다중 이미지 환경에서 보여주는 한계와 실패 원인을 체계적으로 분석하는 것을 목표로 합니다. 특히 모델이 이미지 간 정보를 효과적으로 집계하고, 여러 개념을 동시에 추적하며, 시각적 방해 요소에 대해 얼마나 강건한지를 평가하여 근본적인 약점을 식별하고자 합니다.

#Review #Vision Language Models #Multi-Image Understanding #Failure Analysis #Evaluation Benchmark #Attention Mechanism #Fine-tuning #MIMIC

2026년 1월 18일

[논문리뷰] Language of Thought Shapes Output Diversity in Large Language Models

본 논문은 대규모 언어 모델(LLM)이 겪는 출력 다양성 부족(예: 모드 붕괴, 특정 문화 가치 과대 대표) 문제를 해결하고자 합니다.

#Review #Large Language Models #Output Diversity #Multilingual Reasoning #Language of Thought #Sampling Strategies #Pluralistic Alignment #Hidden State Analysis #Cognitive Science

2026년 1월 18일

[논문리뷰] AstroReason-Bench: Evaluating Unified Agentic Planning across Heterogeneous Space Planning Problems

본 논문은 대규모 언어 모델(LLM) 기반 에이전트가 물리적으로 제한된 실제 환경, 특히 다양한 목표와 엄격한 제약을 가진 우주 계획 문제(SPP) 에서 얼마나 효과적으로 계획하고 행동할 수 있는지 평가하는 것을 목표로 합니다.

#Review #LLM Agents #Space Planning #Benchmark #Agentic Planning #Physics Constraints #Decision Making #Zero-Shot Learning

2026년 1월 18일

[triton] [Blackwell] NVIDIA 차세대 아키텍처를 위한 Triton의 tcgen05.ld.red 최적화 분석

Blackwell 아키텍처의 TMEM 로드 및 리덕션 동시 수행 기능을 Triton Gluon에 구현하여 성능을 최적화한 사례를 분석합니다.

#Triton #Blackwell #NVIDIA #GPU #Optimization #MLIR

2026년 1월 16일