Review

[논문리뷰] Boosting Omni-Modal Language Models: Staged Post-Training with Visually Debiased Evaluation

본 논문은 현대의 Omni-modal LLM들이 기록하는 벤치마크 성능 향상이 진정한 모달리티 통합(integration)보다는 visual shortcut을 활용한 결과일 수 있다는 문제를 제기합니다.

#Review #Omni-modal LLM #Visual Leakage #OmniClean #Staged Post-Training #Self-Distillation #Reinforcement Learning

2026년 5월 14일

[논문리뷰] Beyond Individual Intelligence: Surveying Collaboration, Failure Attribution, and Self-Evolution in LLM-based Multi-Agent Systems

본 논문은 LLM 기반 멀티 에이전트 시스템이 고도로 복잡해짐에 따라 발생하는 비예측적 장애와 구조적 경직성 문제를 해결하기 위해 작성되었습니다.

#Review #LLM-based Agents #Multi-Agent Systems #Multi-Agent Collaboration #Failure Attribution #Self-Evolution

2026년 5월 14일

[논문리뷰] BOOKMARKS: Efficient Active Storyline Memory for Role-playing

기존 Role-playing Agents (RPAs)의 메모리 시스템은 주로 recurrent summarization 방식에 의존하며, 이는 중요한 세부 정보가 압축 과정에서 불가피하게 손실되는 문제를 야기합니다.

#Review #Role-playing Agents #Memory Systems #Search-based Grounding #Active Grounding #Passive Updating #Long-horizon Consistency #Efficiency #Storyline Memory

2026년 5월 14일

[논문리뷰] BEAM: Binary Expert Activation Masking for Dynamic Routing in MoE

본 논문은 표준 MoE 모델의 고정된 Top-K 라우팅 방식이 초래하는 연산 중복 문제를 해결하기 위해 BEAM을 제안한다. 기존의 Top-K 메커니즘은 토큰별 복잡도를 고려하지 않고 모든 토큰에 동일한 수의 Expert를 할당하여 불필요한 연산을 발생시킨다.

#Review #Mixture-of-Experts #Dynamic Routing #Expert Sparsity #Inference Acceleration #Binary Expert Activation Masking #vLLM

2026년 5월 14일

[논문리뷰] Adaptive Teacher Exposure for Self-Distillation in LLM Reasoning

본 논문은 LLM reasoning을 위한 On-Policy Self-Distillation (OPSD)에서 teacher-side exposure mismatch라는 간과된 bottleneck을 식별하고 해결하고자 합니다.

#Review #Self-Distillation #LLM Reasoning #Teacher Exposure #On-Policy #Adaptive Control #Reinforcement Learning #Beta-policy

2026년 5월 14일

[논문리뷰] Achieving Gold-Medal-Level Olympiad Reasoning via Simple and Unified Scaling

본 연구는 고도의 수학 및 과학 Olympiad 문제에서 금메달 수준의 추론 능력을 갖춘 모델을 만들기 위한 간단하고 통합된 레시피를 제안합니다. 기존의 일반적인 추론 모델들은 수학적 문제 해결에서 단기적인 성과를 내지만, 복잡한 증명 문제에 필요한 엄격한 추론과 검증 능력이 부족하다는 한계가 있습니다.

#Review #Olympiad Reasoning #Reinforcement Learning #Test-time Scaling #Supervised Fine-tuning #Reasoning Models #Proof-search #Reverse-Perplexity Curriculum

2026년 5월 14일

[논문리뷰] ATLAS: Agentic or Latent Visual Reasoning? One Word is Enough for Both

본 논문은 기존 Visual Reasoning 기법들이 직면한 연산 효율성 및 아키텍처 호환성 문제를 해결하고자 합니다.

#Review #Visual Reasoning #Functional Token #LA-GRPO #Autoregressive Generation #Multimodal LLM #Agentic Reasoning

2026년 5월 14일

[논문리뷰] WriteSAE: Sparse Autoencoders for Recurrent State

본 논문은 기존의 Residual SAE가 해결하지 못했던 state-space 및 hybrid recurrent language model의 matrix cache write 문제를 다룬다.

#Review #Sparse Autoencoders #State-Space Models #Recurrent Neural Networks #Mechanistic Interpretability #Cache-Patching #WriteSAE

2026년 5월 13일

[논문리뷰] Vividh-ASR: A Complexity-Tiered Benchmark and Optimization Dynamics for Robust Indic Speech Recognition

본 논문은 multilingual ASR 모델인 Whisper를 저자원(Low-resource) 언어로 fine-tuning 할 때 발생하는 성능 불균형 문제를 해결하는 데 집중한다.

#Review #Speech Recognition #Curriculum Learning #Indic Languages #Fine-tuning #Whisper #Studio-bias #Robustness

2026년 5월 13일

[논문리뷰] Visual Aesthetic Benchmark: Can Frontier Models Judge Beauty?

본 연구는 기존의 이미지 미학 평가 방식인 Scalar Score 예측이 인간의 실제 비교 선호도를 충실히 반영하지 못한다는 문제를 제기합니다. 기존 연구들은 독립적인 점수 매기기를 통해 순위를 도출하는데, 이는 annotator 간의 의견 불일치와 모호한 미적 기준을 야기합니다.

#Review #Multimodal Large Language Models #Visual Aesthetic Benchmark #Comparative Ranking #Expert Consensus #Aesthetic Evaluation #Fine-tuning

2026년 5월 13일

[논문리뷰] TrackCraft3R: Repurposing Video Diffusion Transformers for Dense 3D Tracking

본 논문은 기존의 영상 기반 3D 추적 방식이 가진 한계를 극복하고, 사전 학습된 비디오 생성 모델의 풍부한 시공간적 지식을 활용하여 효율적인 dense 3D tracking 프레임워크를 구축하는 것을 목표로 합니다.

#Review #Video Diffusion Transformer #Dense 3D Tracking #Dual-Latent Representation #Temporal RoPE Alignment #Reference-Anchored Tracking

2026년 5월 13일

[논문리뷰] The Extrapolation Cliff in On-Policy Distillation of Near-Deterministic Structured Outputs

본 논문은 LLM의 On-policy Distillation 과정에서 발생하는 reward extrapolation의 한계점을 해결하고자 한다.

#Review #On-policy Distillation #Reward Extrapolation #Structured Output #Format Adherence #Importance Sampling #LLM

2026년 5월 13일

[논문리뷰] The DAWN of World-Action Interactive Models

본 논문은 기존 World Action Models(WAMs)가 세계 예측과 행동 생성을 독립적인 병렬 구조나 고정된 predict-then-plan 파이프라인으로 처리함으로써, 주행 환경의 핵심인 '행동 의존적 미래(action-contingent future)'를 모델링하는 데 한계가 있음을 지적합니다.

#Review #World-Action Interactive Models #Autonomous Driving #Latent Generative Model #Recursive Interaction #Trajectory Planning #Action-Contingent

2026년 5월 13일

[논문리뷰] ShapeCodeBench: A Renewable Benchmark for Perception-to-Program Reconstruction of Synthetic Shape Scenes

본 논문은 현대의 멀티모달 모델들이 이미지를 코드로 변환하는 능력을 평가할 때 발생하는 벤치마크 오염과 고정된 데이터셋의 한계를 해결하기 위해 ShapeCodeBench를 제안합니다. 기존 연구들은 결정론적 실행이나 정밀한 난이도 제어가 부족하여 모델의 실패 원인을 명확히 진단하기 어렵다는 문제가 있었습니다.

#Review #Perception-to-Program Reconstruction #Benchmark #Synthetic Data #Renewable Evaluation #Multimodal Models #DSL

2026년 5월 13일

[논문리뷰] SafeHarbor: Hierarchical Memory-Augmented Guardrail for LLM Agent Safety

본 논문은 LLM 에이전트의 강력한 도구 사용 능력(tool-use)으로 인한 새로운 보안 위협과 기존 방어 기법들의 한계점을 해결하고자 합니다.

#Review #LLM Agent Safety #Memory Mechanism #Guardrail #Adversarial Generation #Information Entropy #Over-refusal Mitigation

2026년 5월 13일

[논문리뷰] RoboEvolve: Co-Evolving Planner-Simulator for Robotic Manipulation with Limited Data

본 논문은 로봇 조작(Robotic Manipulation) 분야에서 작업에 최적화된 물리적 상호작용 데이터가 부족하다는 근본적인 문제를 해결하고자 합니다.

#Review #Robotic Manipulation #Vision-Language Models #Video Generation Models #Self-Evolving Framework #Complementary Learning Systems #Data Efficiency #Reinforcement Learning

2026년 5월 13일

[논문리뷰] Revisiting DAgger in the Era of LLM-Agents

본 논문은 장기 상호작용을 수행하는 LLM 에이전트의 사후 학습(Post-training) 단계에서 발생하는 고질적인 분포 불일치 문제를 해결하고자 합니다.

#Review #LLM-Agents #DAgger #Covariate Shift #Multi-Turn Interaction #Post-Training #Imitation Learning

2026년 5월 13일

[논문리뷰] Retrieval is Cheap, Show Me the Code: Executable Multi-Hop Reasoning for Retrieval-Augmented Generation

본 논문은 기존의 Retrieval-Augmented Generation (RAG) 시스템이 다중 홉 질문 추론에서 보이는 근본적인 한계를 해결하고자 합니다.

#Review #Retrieval-Augmented Generation #Multi-Hop Reasoning #Program Synthesis #Executable Planning #Compiler-Grounded Self-Repair #Adaptive Retrieval

2026년 5월 13일

[논문리뷰] Results and Retrospective Analysis of the CODS 2025 AssetOpsBench Challenge

본 논문은 LLM 기반 에이전트가 복잡한 산업 환경에서 실질적인 능력을 발휘하는지 평가하기 위한 방법론적 문제를 다룹니다. 기존 벤치마크는 지나치게 단순화된 과제에 의존하거나, 실무에서 필수적인 프라이버시 보호 및 다단계 실행 능력을 적절히 측정하지 못하는 한계가 있습니다 .

#Review #Agentic AI #Industry 4.0 #Benchmarking #Privacy-preserving #Multi-agent systems #Performance Evaluation #AssetOpsBench

2026년 5월 13일

[논문리뷰] RealICU: Do LLM Agents Understand Long-Context ICU Data? A Benchmark Beyond Behavior Imitation

본 논문은 기존의 ICU 벤치마크들이 임상 의사결정을 단순한 정적 문제로 치부하거나, 과거 임상 기록을 그대로 정답으로 간주하는 'Behavior Imitation' 오류에 빠져 있다는 점을 지적한다.

#Review #LLM Agents #ICU #Clinical Decision Support #Hindsight-Annotated Benchmark #Structured Memory #Sequential Decision-Making

2026년 5월 13일