Review

[논문리뷰] SoundnessBench: Can Your AI Scientist Really Tell Good Research Ideas from Bad Ones?

본 논문은 Autonomous AI Agents가 연구 파이프라인을 자동화함에 따라, 무분별한 실험 수행 전에 아이디어의 타당성을 걸러내는 First-gate 단계가 필수적임을 강조합니다.

#Review #Autonomous AI Agents #Research Evaluation #Methodological Soundness #Large Language Models #Optimism Bias #Scientific Benchmarking #First-gate Evaluation

2026년 5월 31일

[논문리뷰] Seeing Isn't Knowing: Do VLMs Know When Not to Answer Spatial Questions (and Why)?

본 논문은 기존의 spatial reasoning 벤치마크들이 시각적 관측이 항상 충분하고 신뢰 가능하다는 비현실적인 가정에 의존하고 있다는 점을 지적합니다.

#Review #Vision-Language Models #Spatial Reasoning #Observational Uncertainty #Abstention #Occlusion #Perspective Ambiguity #Embodied AI

2026년 5월 31일

[논문리뷰] SCOPE: Self-Play via Co-Evolving Policies for Open-Ended Tasks

본 연구는 기존 LLM의 Self-play가 수학, 코드 등 규칙 검증이 가능한 도메인에 한정되어 있으며, 오픈형 과제에서는 외부 데이터나 Frontier Model에 대한 의존성을 벗어나지 못한다는 문제점을 해결하고자 합니다.

#Review #Self-Play #Open-Ended Tasks #Reinforcement Learning #Rubric Reward #Retrieval-Augmented Generation #Co-Evolution #Data-Free

2026년 5월 31일

[논문리뷰] SANA-Streaming: Real-time Streaming Video Editing with Hybrid Diffusion Transformer

본 논문은 실시간 스트리밍 Video-to-Video(V2V) 편집에서 발생하는 시간적 일관성 유지와 추론 성능 제한 문제를 해결하기 위해 SANA-Streaming을 제안한다.

#Review #Diffusion Transformer #Streaming Video Editing #Hybrid Architecture #Cycle-Reverse Regularization #Mixed-Precision Quantization #Real-time Inference

2026년 5월 31일

[논문리뷰] SAAS: Self-Aware Reinforcement Learning for Over-Search Mitigation in Agentic Search

본 논문은 Agentic Search 시스템에서 발생하는 심각한 Over-search 문제를 해결하기 위해 SAAS 프레임워크를 제안합니다.

#Review #Agentic Search #Reinforcement Learning #Over-Search Mitigation #Knowledge Boundary #Search Efficiency #Reward Hacking

2026년 5월 31일

[논문리뷰] Representation Forcing for Bottleneck-Free Unified Multimodal Models

본 논문은 기존 UMM이 frozen VAE에 의존하여 발생하는 structural bottleneck 문제를 해결하기 위해 Representation Forcing (RF)을 제안한다 .

#Review #Unified Multimodal Models #Representation Forcing #Pixel-space Diffusion #Vector Quantization #End-to-End Learning #Bottleneck-Free #Mixture-of-Transformers

2026년 5월 31일

[논문리뷰] Recovering Policy-Induced Errors: Benchmarking and Trajectory Synthesis for Robust GUI Agents

본 논문은 최신 GUI 에이전트가 뛰어난 성능을 보임에도 불구하고, 실행 과정에서 발생하는 Policy-Induced Errors를 인지하고 복구하는 능력이 부족하여 실제 배포에 한계가 있다는 문제를 해결하고자 합니다.

#Review #GUI Agent #Robustness #Trajectory Synthesis #Policy-Induced Errors #Error Recovery #VLM

2026년 5월 31일

[논문리뷰] PEEK: Picking Essential frames via Efficient Knowledge distillation

본 논문은 현대의 Vision-Language Models (VLMs)가 비디오 이해를 위해 제한된 수의 프레임만을 처리할 수 있다는 병목 문제를 해결하는 데 목적이 있습니다.

#Review #Video-language models #Frame selection #Knowledge distillation #Video captioning #Query-free sampling #Temporal modeling

2026년 5월 31일

[논문리뷰] OpenSkillEval: Automatically Auditing the Open Skill Ecosystem for LLM Agents

본 논문은 급격히 팽창하는 LLM Agent용 Skill 생태계에서 발생하는 평가의 불투명성과 비효율성 문제를 해결하고자 합니다. 현재 커뮤니티에서 배포되는 수많은 Skill들이 실제 성능 향상에 기여하는지, 혹은 특정 모델과 프레임워크에서 어떻게 상호작용하는지에 대한 체계적인 분석이 부재합니다.

#Review #LLM Agents #Agent Skills #Automatic Evaluation #Skill Ecosystem #Benchmarking #Trajectory Trace Analysis #Artifact Evaluation

2026년 5월 31일

[논문리뷰] One Click per Cell Type Suffices: Training-free Group Interaction for Cell Instance Segmentation

본 연구는 기존 세포 인스턴스 분할 모델들이 학습 데이터에 종속되어 Out-of-Distribution (OOD) 세포 유형에서 성능이 급격히 저하되는 문제를 해결하고자 합니다.

#Review #Cell Instance Segmentation #Foundation Models #Group Prompting #Chain-of-Prompts #Training-free #Histopathology #SAM

2026년 5월 31일

[논문리뷰] Not All Disagreement Is Learnable: Token Teachability in On-Policy Distillation

본 논문은 기존의 Selective OPD 기법들이 단순히 토큰의 불확실성(Entropy)이나 교사-학생 간의 불일치(Divergence)만을 토큰 선택 기준으로 삼는 한계를 해결하고자 합니다.

#Review #On-policy Distillation #Knowledge Distillation #Token Teachability #Selective OPD #Teacher-Student Compatibility

2026년 5월 31일

[논문리뷰] Memory-Bound but Not Bandwidth-Limited: The Physical AI Inference Gap in Batch-1 LLM Decode

본 논문은 Physical AI 환경에서 필수적인 batch-1 LLM decode가 단순히 HBM Bandwidth에 의해서만 제한되는 것이 아니라, CPU-side Launch Overhead에 의해 크게 제약받고 있음을 밝힙니다.

#Review #Batch-1 Inference #LLM Decode #HBM Bandwidth #CUDA Graphs #Launch Overhead #Physical AI

2026년 5월 31일

[논문리뷰] Mellum2 Technical Report

Marko Kojic이 arXiv에 게시한 'Mellum2 Technical Report' 논문에 대한 자세한 리뷰입니다.

#Review #LLM #Pretraining #Model Architecture #Technical Report #Evaluation #Training Pipeline

2026년 5월 31일

[논문리뷰] MAAT: Multi-phase Adapter-Aware Targeted Unlearning

본 논문은 기존의 기계 망각(machine unlearning) 연구들이 인과 관계(causal knowledge)를 다루는 'Why-type' 질문에 대한 평가가 전무하다는 결정적인 결함을 해결하고자 한다.

#Review #Machine Unlearning #LoRA #Causal Knowledge #5WBench #Adapter-Aware #SVD Pruning

2026년 5월 31일

[논문리뷰] Lumos-Nexus: Efficient Frequency Bridging with Homogeneous Latent Space for Video Unified Models

본 논문은 기존의 커넥터 기반 비디오 생성 모델이 높은 시각적 품질과 복잡한 논리적 추론 능력을 동시에 달성하는 데 겪는 한계를 해결하고자 합니다.

#Review #Video Unified Models #Unified Progressive Frequency Bridging #Reasoning-driven Generation #Connector-based #Flow-matching #Visual Fidelity

2026년 5월 31일

[논문리뷰] LongTraceRL: Learning Long-Context Reasoning from Search Agent Trajectories with Rubric Rewards

본 논문은 기존 long-context 강화학습이 가진 데이터의 낮은 난이도와 보상 신호의 희소성(Sparsity) 문제를 해결하고자 합니다.

#Review #Long-Context #Reinforcement Learning #Rubric Reward #Search Agent Trajectories #Tiered Distractors #Multi-hop Reasoning

2026년 5월 31일

[논문리뷰] LongDS-Bench: On the Failure of Long-Horizon Agentic Data Analysis

실제 데이터 분석은 단일 단계가 아닌, 긴 세션 동안 상태가 지속적으로 축적되고 변화하는 반복적 과정입니다. 그러나 기존 데이터 분석 벤치마크는 주로 독립적이거나 짧은 인터랙티브 작업만을 평가하여, 복잡한 분석 세션 속에서 상태를 추적하고 수정하는 에이전트의 능력을 충분히 테스트하지 못합니다 .

#Review #Agentic Data Analysis #Long-Horizon #State Management #Benchmark #LLM Agents #State-Evolution

2026년 5월 31일

[논문리뷰] Linear Scaling Video VLMs for Long Video Understanding

본 논문은 현대의 Video VLM이 긴 비디오나 실시간 스트리밍 작업을 처리할 때 겪는 이차 시간(quadratic time) 복잡도 문제를 해결하는 것을 목표로 합니다.

#Review #Video VLM #Long-video Understanding #Linear Scaling #StateKV #KV Cache Compression #Attention Approximation

2026년 5월 31일

[논문리뷰] Light Interaction: Training-Free Inference Acceleration for Interactive Video World Models

본 논문은 interactive video world model의 장기 생성 시 발생하는 과도한 연산 비용과 추론 지연 문제를 해결하기 위해 Light Interaction을 제안한다.

#Review #Interactive Video World Models #Inference Acceleration #Adaptive Context Management #Denoising Cache Acceleration #3D Sparse Attention #Autoregressive Generation

2026년 5월 31일

[논문리뷰] How can embedding models bind concepts?

본 논문은 최신 Vision-Language Embedding Models인 CLIP이 개념을 개별적으로는 잘 인지하면서도, 이들을 올바르게 조합하여 객체를 구성하는 Concept Binding에는 실패하는 문제에 주목합니다.

#Review #Concept Binding #Embedding Models #Compositional Generalization #Multiplicative Interaction #Representation Geometry #CLIP #Transformer

2026년 5월 31일