최신 포스트

[논문리뷰] δ-mem: Efficient Online Memory for Large Language Models

본 연구는 LLM이 장기적인 대화와 에이전트 작업에서 과거 이력을 효과적으로 누적하고 재사용하지 못하는 문제를 해결하고자 합니다. 기존의 방식인 Context Window 확장은 연산 복잡도가 Quadratic하게 증가하고, 정보 누락이나 Context rot 현상이 발생하는 한계가 있습니다.

#Review #Large Language Models #Online Memory #Associative Memory #Low-rank Correction #Delta-rule Learning #Attention Mechanism

2026년 5월 12일

[논문리뷰] WildRelight: A Real-World Benchmark and Physics-Guided Adaptation for Single-Image Relighting

본 논문은 최신 생성형 모델 기반의 싱글 이미지 리라이팅 기법들이 합성 데이터셋에서는 우수한 성능을 보이지만, 실제 환경(in-the-wild)에서의 성능은 크게 검증되지 않았다는 문제 의식에서 출발한다.

#Review #Single-Image Relighting #Dataset #Inverse Rendering #Diffusion Posterior Sampling #Test-Time Adaptation #Sim-to-Real

2026년 5월 12일

[논문리뷰] The Many Faces of On-Policy Distillation: Pitfalls, Mechanisms, and Fixes

본 연구는 OPD와 OPSD가 시스템 프롬프트 및 지식 내재화에는 효과적이나, 최근 연구들에서 보고된 학습 불안정성(instability) 및 성능 저하(degradation) 문제를 근본적으로 규명하고자 합니다.

#Review #On-Policy Distillation #Self-Distillation #Language Models #Reverse-KL #Privileged Information #Optimization Stability #RLVR

2026년 5월 12일

[논문리뷰] RubricEM: Meta-RL with Rubric-guided Policy Decomposition beyond Verifiable Rewards

본 논문은 Verifiable Reward(검증 가능한 보상)가 부재한 Open-ended 환경에서 Deep Research 에이전트를 효율적으로 학습시키는 문제를 해결하고자 한다.

#Review #Meta-RL #Deep Research #Reinforcement Learning #Policy Decomposition #Rubric-guided #Stagewise Credit Assignment #Reflection Meta-Policy

2026년 5월 12일

[논문리뷰] MoCam: Unified Novel View Synthesis via Structured Denoising Dynamics

본 논문은 생성적 Novel View Synthesis에서 발생하는 기하학적 정보와 외형 정보 간의 근본적인 갈등 문제를 해결하고자 합니다.

#Review #Novel View Synthesis #Diffusion Models #Structured Denoising #Geometry-Appearance Disentanglement #4D Re-camera #Video Generative Models

2026년 5월 12일

[논문리뷰] Lite3R: A Model-Agnostic Framework for Efficient Feed-Forward 3D Reconstruction

본 논문은 현대의 Transformer 기반 3D reconstruction 파이프라인이 겪는 연산 효율성 및 저정밀도 실행 시의 불안정성 문제를 해결하고자 합니다.

#Review #3D Reconstruction #Transformer #Sparse Linear Attention #FP8-aware QAT #Model-Agnostic #Knowledge Distillation #Algorithm-System Co-design

2026년 5월 12일

[논문리뷰] Images in Sentences: Scaling Interleaved Instructions for Unified Visual Generation

본 논문은 기존 멀티모달 생성 모델들이 복잡한 다중 이미지 명령을 처리할 때 발생하는 성능 저하 문제를 해결하기 위해 제안되었습니다.

#Review #Multimodal Generation #Interleaved Instructions #Object Binding #Transformer #Multimodal Image Editing #Scalable Data Engine

2026년 5월 12일

[논문리뷰] From Web to Pixels: Bringing Agentic Search into Visual Perception

본 논문은 기존의 시각적 인지 모델이 이미지 내의 시각적 단서나 모델 내부의 Frozen Knowledge에만 의존하여 발생하는 한계점을 해결하고자 합니다.

#Review #Perception Deep Research #WebEyes #Pixel-Searcher #Multimodal Intelligence #Visual Grounding #Search-based Segmentation

2026년 5월 12일

[논문리뷰] Do not copy and paste! Rewriting strategies for code retrieval

본 연구는 코드 검색을 위한 기존의 임베딩 기반 기법들이 코드의 표면적인 문법적 특징에 과도하게 의존(Overfit)하여, 실제 의미론적 행동(Program behavior)을 파악하는 데 한계가 있다는 문제에서 출발합니다.

#Review #Code Information Retrieval #Large Language Models #Rewriting #Embedding #PseudoCode #Token Entropy #Representational Analysis

2026년 5월 12일

[논문리뷰] Continual Harness: Online Adaptation for Self-Improving Foundation Agents

본 논문은 embodied agent가 복잡하고 긴 호흡의 환경에서 명확한 도메인 스캐폴딩 없이도 자율적으로 학습하고 진화할 수 있는 체계를 구축하고자 합니다 .

#Review #Foundation Agents #Continual Harness #Online Adaptation #Embodied AI #In-Context Learning #Reset-Free Training #Process Reward Models

2026년 5월 12일

[논문리뷰] Beyond GRPO and On-Policy Distillation: An Empirical Sparse-to-Dense Reward Principle for Language-Model Post-Training

본 연구는 Language-Model post-training 시 제한된 labeled training data가 비효율적으로 사용되는 문제를 해결하고자 한다.

2026년 5월 12일

[논문리뷰] Agent-ValueBench: A Comprehensive Benchmark for Evaluating Agent Values

본 연구는 autonomous agents의 가치 체계가 기반이 되는 LLM의 가치와는 본질적으로 다르며, 이를 체계적으로 평가할 수 있는 도구가 부재하다는 문제 의식에서 출발합니다. 기존의 ValueBench나 ValueCompass와 같은 연구들은 주로 정적인 텍스트 생성 모델의 가치 평가에만 국한되어 있습니다.

#Review #Autonomous Agents #Value Alignment #Benchmark #Agentic Modality #Harness Alignment #Skill Steering

2026년 5월 12일

[논문리뷰] A Causal Language Modeling Detour Improves Encoder Continued Pretraining

본 논문은 도메인 적응을 위한 인코더의 Continued Pretraining에서 기존의 MLM 단독 학습 방식이 갖는 한계를 극복하고자 합니다. 저자들은 특정 도메인(특히 Biomedical) 데이터에서 모델이 충분한 성능을 내지 못하는 이유가 학습 목적 함수 자체의 경직성 때문임을 지적합니다.

#Review #Continued Pretraining #Causal Language Modeling #Masked Language Modeling #Domain Adaptation #Biomedical Encoders #CKA #Freeze Interventions #ModernBERT

2026년 5월 12일

[sglang] DeepseekV4 모델의 입력 레이어 정규화와 FP8 양자화를 융합하여 성능 최적화

DeepseekV4 모델의 입력 레이어 정규화와 FP8 양자화를 융합하여 GPU 연산 효율성을 높였습니다.

#AI #딥러닝 #최적화 #FP8 #GPU

2026년 5월 12일

[vllm] vLLM W8W8 그룹 양자화 성능 최적화: 2D-Grid를 통한 Divmod 제거

vLLM의 W8W8 그룹 양자화 커널에서 divmod 연산을 2D-grid로 대체하여 성능을 개선합니다.

#vLLM #CUDA #GPU 최적화 #양자화 #성능 #divmod #2D-grid

2026년 5월 12일

[sglang] NPU 성능 향상을 위한 causal_conv1d_update_v2 도입

NPU 환경에서 causal_conv1d_update_v2를 사용하여 모델 추론 속도를 크게 개선했습니다.

#NPU #성능 최적화 #딥러닝 #LLM #SGLang

2026년 5월 12일

[vllm] vLLM Mamba2 SSD 커널 웜업: 첫 요청 지연 시간 91% 감소의 비결

vLLM Mamba2 모델의 첫 요청 지연 시간을 91% 줄인 Triton 커널 웜업 최적화 분석.

#vLLM #Mamba2 #Triton #Kernel Optimization #Latency Reduction #Deep Learning Inference

2026년 5월 12일

[onnxruntime] [ONNX Runtime] PagedAttention의 FA 경로 최적화 및 정확성 개선

PagedAttention의 FA 경로에서 휴리스틱 기반 max_query_len을 실제 계산값으로 대체하여 성능 향상 및 CUDA 오류를 해결했습니다.

#ONNXRuntime #CUDA #FlashAttention #Optimization #LLM

2026년 5월 12일

[vllm] vLLM의 MLA 성능 극대화: RoPE, KV Cache, q_concat 연산 퓨전 최적화

vLLM에서 MLA 모델의 RoPE, KV Cache, q_concat 연산을 하나의 커널로 통합하여 추론 성능을 크게 향상시킨 최적화 기법을 분석합니다.

#vLLM #LLM #CUDA #Optimization #MLA #DeepSeek-R1

2026년 5월 11일

[sglang] SGLang NPU 최적화: MoE 모델을 위한 Dual Stream 병렬 처리 도입

NPU 환경에서 Shared Expert와 Routed Expert 연산을 독립적인 스트림으로 분리하여 MoE 모델의 처리량을 11% 이상 향상시켰습니다.

#SGLang #NPU #MoE #Performance Optimization #Deep Learning

2026년 5월 11일