Review

[논문리뷰] Monitoring the Internal Monologue: Probe Trajectories Reveal Reasoning Dynamics

본 논문은 LRM에서 생성되는 Chain of Thought(CoT)가 모델의 최종 출력과 항상 일치하지 않는다는 'Unfaithfulness' 문제를 해결하고자 합니다 .

#Review #Large Reasoning Models #Chain of Thought #Probe Trajectories #Representation Engineering #AI Safety #Max-pooling #Interpretability

2026년 5월 18일

[논문리뷰] Model-Adaptive Tool Necessity Reveals the Knowing-Doing Gap in LLM Tool Use

본 논문은 LLM agent의 Adaptive Tool Use 과정에서 발생하는 성능 저하와 불투명성 문제를 해결하기 위해 모델 고유의 capability에 기반한 Model-Adaptive Tool Necessity 프레임워크를 제안합니다.

#Review #LLM #Tool Use #Meta-cognition #Knowing-Doing Gap #Representation Engineering #Model-Adaptive

2026년 5월 18일

[논문리뷰] MixSD: Mixed Contextual Self-Distillation for Knowledge Injection

본 논문은 LLM에 새로운 지식을 주입할 때 발생하는 Catastrophic Forgetting 문제를 해결하고자 한다.

#Review #Knowledge Injection #Self-Distillation #Catastrophic Forgetting #Language Models #Distribution Alignment #Fine-tuning

2026년 5월 18일

[논문리뷰] MementoGUI: Learning Agentic Multimodal Memory Control for Long-Horizon GUI Agents

본 논문은 현재의 GUI agent가 장기적(Long-Horizon) 태스크 수행 시 인터페이스 변화에 따른 태스크 상태를 유지하는 데 한계를 보인다는 점을 문제로 지적합니다.

#Review #GUI Agents #Multimodal Memory #Long-Horizon #Memory Control #MLLM #Working Memory #Episodic Memory

2026년 5월 18일

[논문리뷰] Measuring Maximum Activations in Open Large Language Models

본 논문은 최신 오픈 LLM 생태계에서 Activation의 동적 범위(Dynamic Range)가 단순히 파라미터 수에 비례한다는 기존의 통념을 재검토하고, 모델별 Maximum Activation Magnitude(MM)를 체계적으로 측정하여 배포 시의 위험을 파악하고자 합니다.

#Review #Large Language Models #Activation Range #Quantization #Maximum Activation #LLM Inference #Residual Stream #Model Scaling

2026년 5월 18일

[논문리뷰] LongLive-2.0: An NVFP4 Parallel Infrastructure for Long Video Generation

본 논문은 긴 비디오 생성 시 발생하는 메모리 병목 현상과 낮은 연산 효율 문제를 해결하기 위해 시스템과 알고리즘이 통합된 인프라 LongLive-2.0을 제안한다.

#Review #Long Video Generation #NVFP4 #Sequence Parallelism #Autoregressive Diffusion #KV Cache Quantization #Balanced SP

2026년 5월 18일

[논문리뷰] LiteFrame: Efficient Vision Encoders Unlock Frame Scaling in Video LLMs

본 연구는 장편 비디오 이해를 위해 Video LLMs를 확장할 때 발생하는 고질적인 계산 복잡도와 효율성 병목 문제를 해결하는 데 집중합니다.

#Review #Video LLMs #Vision Encoder #Token Compression #Compressed Token Distillation #Long-form Video Understanding #Spatio-temporal Modeling

2026년 5월 18일

[논문리뷰] Lance: Unified Multimodal Modeling by Multi-Task Synergy

본 논문은 기존 멀티모달 모델들이 이해와 생성이라는 두 가지 이질적인 목적을 통합할 때 발생하는 성능 저하와 작업 범위의 한계를 해결하기 위해 제안되었습니다.

#Review #Unified Multimodal Modeling #Multi-Task Synergy #Dual-Stream Architecture #Modality-Aware Rotary Positional Encoding #Autoregressive Modeling #Flow Matching

2026년 5월 18일

[논문리뷰] KVPO: ODE-Native GRPO for Autoregressive Video Alignment via KV Semantic Exploration

기존의 비디오 생성 모델 정렬 기법들은 주로 노이즈 기반의 탐색(exploration)이나 SDE 기반의 surrogate policy를 사용하여, 결정론적(deterministic) ODEdynamics로 작동하는 distilled AR 모델의 특성과 상충하는 문제를 야기합니다 .

#Review #Autoregressive Video Generation #Reinforcement Learning #Policy Optimization #Flow Matching #KV Caching #Causal-Semantic Exploration #Trajectory Velocity Energy

2026년 5월 18일

[논문리뷰] Incantation: Natural Language as the Action Interface for Multi-Entity Video World Models

본 논문은 현대적인 대화형 비디오 세계 모델들이 가진 구조적 한계인 Action Interface의 고착화 문제를 해결합니다.

#Review #Interactive Video World Model #Natural Language Action Interface #Multi-Entity Control #Cross-Entity Transfer #Streaming Inference #Self-Forcing Distillation

2026년 5월 18일

[논문리뷰] Geometric Phase Transition Enables Extreme Hippocampal Memory Capacity

본 연구는 생물학적 기억 체계가 어떻게 뉴런의 물리적 증식 없이도 정보 용량을 획기적으로 확장하는지 해결하고자 합니다.

#Review #Hippocampal Memory #Geometric Stability #Neural Manifold #Population Code #Excitatory-Inhibitory Dynamics #Crystalline Code

2026년 5월 18일

[논문리뷰] GRASP: Learning to Ground Social Reasoning in Multi-Person Non-Verbal Interactions

본 논문은 현재 MLLMs가 다중 인원 비디오에서 미묘한 비언어적 단서에 기반한 사회적 추론을 수행하는 데 어려움을 겪는 문제를 해결합니다.

2026년 5월 18일

[논문리뷰] From Runnable to Shippable: Multi-Agent Test-Driven Development for Generating Full-Stack Web Applications from Requirements

본 논문은 현재의 코딩 에이전트가 웹 애플리케이션 생성 시 겪는 70% 이상의 기능적 요구사항 미충족 문제를 해결하는 것을 목표로 합니다. 기존의 에이전트는 코드 파일이나 터미널 출력만을 기반으로 검증을 수행하지만, 웹 애플리케이션의 정확성은 브라우저 환경에서의 동적 상호작용을 통해서만 평가될 수 있습니다 .

#Review #Multi-Agent System #Test-Driven Development #Web Development #Code Generation #Closed-Loop Validation #Large Language Model

2026년 5월 18일

[논문리뷰] FINESSE-Bench: A Hierarchical Benchmark Suite for Financial Domain Knowledge and Technical Analysis in Large Language Models

본 논문은 기존의 금융 벤치마크가 지닌 한계를 극복하고 LLM의 실질적인 금융 전문 역량을 정밀하게 진단하기 위해 FINESSE-Bench를 제안한다.

#Review #Large Language Models #Financial Benchmarking #Difficulty Hierarchy #Technical Analysis #LLM-as-Judge #Professional Competence #Financial Reasoning

2026년 5월 18일

[논문리뷰] Evaluating Cognitive Age Alignment in Interactive AI Agents

본 논문은 최첨단 MLLM 에이전트가 높은 태스크 정확도에도 불구하고 실제 아동과의 상호작용에서 인지적 수준이 맞지 않는 설명을 제공하거나 과도하게 복잡한 추론을 시도하는 문제를 해결하고자 한다.

#Review #Cognitive Age Alignment #MLLM Agents #ChildAgentEval #Developmental Psychology #Skill-Guided Distillation #WISC #Interactive Evaluation

2026년 5월 18일

[논문리뷰] EndPrompt: Efficient Long-Context Extension via Terminal Anchoring

본 논문은 LLM의 컨텍스트 윈도우 확장이 요구하는 막대한 계산 자원과 데이터 수집의 어려움을 해결하기 위해 EndPrompt를 제안합니다.

#Review #Long-Context Extension #EndPrompt #Terminal Anchoring #Positional Interpolation #RoPE #Transformer #Sparse Supervision

2026년 5월 18일

[논문리뷰] E-PMQ: Expert-Guided Post-Merge Quantization with Merged-Weight Anchoring

본 논문은 모델 병합(Model Merging) 후 저비트 양자화(Low-bit Quantization)를 적용할 때 발생하는 성능 저하 문제를 해결하고자 합니다.

#Review #Post-Merge Quantization #Model Merging #PTQ #Quantization Deviation #Merged-Weight Anchoring #Expert-Guided Calibration

2026년 5월 18일

[논문리뷰] CompactAttention: Accelerating Chunked Prefill with Block-Union KV Selection

본 논문은 기존 Chunked Prefill 환경에서 Block-Sparse Attention 및 Query-Subsampled KV Selection 방식이 가진 성능 한계를 극복하기 위해 CompactAttention을 제안합니다.

#Review #Chunked Prefill #KV Selection #Block-Sparse Attention #Paged Attention #Zero-Copy Execution #Long-Context LLM

2026년 5월 18일

[논문리뷰] Code-as-Room: Generating 3D Rooms from Top-Down View Images via Agentic Code Synthesis

본 논문은 기존의 text-driven 3D 생성 방식이 갖는 공간적 정보의 불명확성과, 기존 agentic 프레임워크가 holistic room generation 과정에서 직면하는 무한 루프 및 불안정성 문제를 해결하고자 합니다.

#Review #Agentic AI #3D Room Synthesis #MLLM #Blender Code #Execution Harness #Cross-stage Memory #Top-down View

2026년 5월 18일

[논문리뷰] Code as Agent Harness

본 논문은 LLM 기반 에이전트 시스템에서 코드가 단순한 생성 대상(target artifact)을 넘어, 시스템의 핵심 운영 인프라로 전환되고 있다는 점을 지적한다.

#Review #Agent Harness #Coding Agent #Harness Engineering #Agentic AI #Code-as-Agent-Harness #Executable Verification

2026년 5월 18일