Review

[논문리뷰] BenchEvolver: Frontier Task Synthesis via Solution-Centric Evolution

본 논문은 최신 Frontier LLM들이 기존의 코딩 벤치마크(LiveCodeBench 등)에서 90% 이상의 높은 Pass@1 성능을 기록하며 벤치마크가 포화(Saturation)되는 문제를 해결하고자 합니다.

#Review #Frontier LLM #Coding Benchmark #Task Evolution #Solution-Centric #Reinforcement Learning #Executable Semantics #Self-Improvement

2026년 6월 3일

[논문리뷰] AutoLab: Can Frontier Models Solve Long-Horizon Auto Research and Engineering Tasks?

본 논문은 기존 벤치마크가 단기적 또는 단일 단계(single-turn) 성능 평가에 치중되어 있어, 실제 과학 및 공학 분야에서 요구되는 장기적 반복 최적화 프로세스를 평가하지 못하는 한계를 해결하고자 합니다 .

#Review #AutoLab #Long-horizon optimization #Frontier models #Agentic benchmarks #Closed-loop optimization #System optimization #CUDA kernel optimization

2026년 6월 3일

[논문리뷰] Audio Interaction Model

본 논문은 기존의 Large Audio Language Models(LALMs)가 고정된 전체 오디오 입력을 처리하는 수동적(Offline) 모델에 머물러 있어, 인간의 실시간 상호작용 특성을 반영하지 못하는 한계를 해결하고자 합니다.

#Review #Large Audio Language Models #Streaming Interaction #Perceive–Decide–Respond #FIFO Scheduling #SoundFlow #StreamAudio-2M #Proactive-Sound-Bench

2026년 6월 3일

[논문리뷰] Agent libOS: A Library-OS-Inspired Runtime for Long-Running, Capability-Controlled LLM Agents

본 논문은 기존 LLM 에이전트 프레임워크가 가지는 보안 경계의 모호함과 장기 실행 에이전트에 대한 인프라 부족 문제를 해결하기 위해 Agent libOS를 제안합니다.

#Review #LLM Agents #Library OS #Runtime Security #Capability-based Security #Object Memory #Tool-use #System Architecture

2026년 6월 3일

[논문리뷰] Access Sets Matter: Budgeting Expert Reads for Scalable Weight-Space Model Merging

본 논문은 LLM 규모의 모델 병합(Merging) 작업에서 발생하는 과도한 Expert-read I/O 병목 문제를 해결하는 것을 목표로 합니다.

#Review #Model Merging #LLM Systems #Parameter-Efficient Adaptation #Expert Access-Set #I/O Budgeting #Weight-Space Merging #MergePipe

2026년 6월 3일

[논문리뷰] AUDITFLOW: Executable Symbolic Environments for Structured Financial Reporting Verification

본 논문은 LLM 기반의 금융 감사 모델이 구조화된 XBRL 데이터 내의 수치적 일관성을 검증하는 데 겪는 신뢰성 문제를 해결하고자 합니다. 기존 LLM 에이전트들은 검색이나 도구 활용 능력이 뛰어나지만, 정작 중요한 수치 계산 및 규칙 적용을 모델의 추론 능력에 의존함으로써 높은 오류율을 보입니다 .

#Review #XBRL #Financial Auditing #Multi-agent Framework #Symbolic Environment #Graph-grounded #Numerical Consistency

2026년 6월 3일

[논문리뷰] AAD-1: Asymmetric Adversarial Distillation for One-Step Autoregressive Video Generation

본 논문은 one-step autoregressive video generation에서 발생하는 motion collapse와 학습 불안정성 문제를 해결하고자 합니다 .

#Review #One-Step Autoregressive Video Generation #Asymmetric Adversarial Distillation #Diffusion Models #Bidirectional Discriminator #Holistic Discrimination #Distribution Matching Distillation

2026년 6월 3일

[논문리뷰] αDepth: Learning Single-Pass Soft Boundary Decomposition for Stereo Conversion

본 논문은 모노큘러 이미지에서 고품질의 스테레오 영상을 생성할 때 발생하는 soft boundary 처리 문제를 해결합니다. 기존의 depth 추정 모델은 픽셀당 하나의 깊이 값만 할당하므로, 경계면에서의 색상 혼합으로 인해 발생하는 깊이 모호성을 처리하지 못해 왜곡된 3D 구조를 생성합니다 .

#Review #Stereo Conversion #Soft Boundary Decomposition #Circular Alpha Representation #Depth Ambiguity #Layered Representation #Alpha Matting

2026년 6월 2일

[논문리뷰] Ψ-Bench: Evaluating Persona-Sensitive Influencing in Persuasive Dialogues

본 논문은 현대의 Personalized LLM Agent가 사용자의 선호에 맞춘 수동적 응답자(Passive Responder)에 머물러 있다는 한계를 지적하며, 보다 능동적인 설득 및 가이드 능력을 갖춘 'Proactive Personalization'의 필요성을 제기합니다.

#Review #LLM #Personalization #Persuasive Dialogue #Persona-Sensitive Influencing #Proactive Agent #Benchmark

2026년 6월 2일

[논문리뷰] World Models Meet Language Models: On the Complementarity of Concrete and Abstract Reasoning

본 논문은 미래지향적 시각 추론에서 World Models와 MLLMs를 결합할 때 발생하는 신뢰성 문제를 해결하고자 합니다. 기존의 단순한 결합 방식은 생성된 Rollout이 확률적이고 때로는 작업상 부정확할 수 있음에도 불구하고, 이를 에이전트가 효과적으로 제어하지 못한다는 한계가 있습니다 .

#Review #World Models #Multimodal Large Language Models (MLLMs)#Controlled Concrete Reasoning #Privileged-Future On-Policy Self-Distillation (PF-OPSD)#Future Prediction #Simulation-Control

2026년 6월 2일

[논문리뷰] Value-Aware Stochastic KV Cache Eviction for Reasoning Models

본 논문은 Reasoning 모델이 복잡한 추론 과정에서 생성하는 긴 출력(Chain of Thought)으로 인해 발생하는 심각한 메모리 및 연산 병목 현상을 해결하고자 한다.

#Review #KV Cache #Eviction #Reasoning Models #Stochasticity #Value-Awareness #Sparse Attention #Large Language Models

2026년 6월 2일

[논문리뷰] Ultralytics YOLO26: Unified Real-Time End-to-End Vision Models

본 논문은 실시간 객체 탐지 모델이 가진 NMS 의존성, 불필요한 모델 파라미터 팽창, 학습 효율성 저하, 그리고 소형 객체 탐지 실패 문제를 해결하고자 합니다 .

#Review #YOLO26 #Real-Time Object Detection #End-to-End #NMS-Free #MuSGD #STAL #Instance Segmentation

2026년 6월 2일

[논문리뷰] Trust Region On-Policy Distillation

본 논문은 Small Reasoning Models (SRM)을 위한 On-Policy Distillation (OPD)의 학습 불안정성과 비효율성 문제를 해결하고자 합니다.

#Review #On-Policy Distillation #Reasoning Models #Trust Region #Policy Gradient #Knowledge Distillation #Language Models

2026년 6월 2일

[논문리뷰] TRON: Targeted Rule-Verifiable Online Environments for Visual Reasoning RL

본 연구는 시각적 추론(visual reasoning)을 위한 RL 학습 시, 정적 데이터셋(static datasets)이 가진 한계를 극복하기 위해 수행되었습니다.

#Review #Reinforcement Learning #Visual Reasoning #Online Environment #Multimodal Large Language Models #Rule-Verifiable #Curriculum Learning

2026년 6월 2일

[논문리뷰] Small RL Controller, Large Language Model: RL-Guided Adaptive Sampling for Test-Time Scaling

본 논문은 LLM의 추론 성능을 높이기 위한 Test-Time Scaling이 과도한 연산 비용과 지연 시간(Latency)을 초래한다는 문제를 해결하고자 합니다.

#Review #Test-Time Scaling #Adaptive Sampling #Reinforcement Learning #Markov Decision Process #Inference Efficiency #Large Language Models

2026년 6월 2일

[논문리뷰] Prior Availability in Industrial Visual Sim-to-Real: A Review of CAD-Guided and CAD-Unavailable Regimes

본 논문은 산업용 비전 시스템이 직면한 핵심 문제인 '데이터 활용 가능성'과 '실제 배포 환경 간의 도메인 간극'을 체계적으로 재정의한다 . 기존 연구들은 시뮬레이션에서 현실로의 전이를 단순히 합성 이미지에서 실사 이미지로의 변환으로 좁게 해석하는 한계가 있다.

#Review #Industrial Visual Sim-to-Real #Prior Availability #CAD-Guided Vision #CAD-Unavailable Inspection #6D Object Pose Estimation #Industrial Anomaly Detection #Domain Gap

2026년 6월 2일

[논문리뷰] Pressure-Testing Deception Probes in LLMs: Scaling, Robustness, and the Geometry of Deceptive Representations

본 연구는 LLM의 deception detection을 위해 사용되는 Linear Probes가 실전 환경에서 보이는 극심한 성능 저하의 원인을 규명하고자 합니다.

#Review #LLM #Deception Detection #Linear Probes #Scaling Laws #Robustness #Geometric Analysis #Activation Engineering

2026년 6월 2일

[논문리뷰] PlatonicNav: Unveiling Semantic Correspondence in Navigation with Platonic Topological Maps

본 논문은 기존의 Embodied Navigation 연구들이 Vision-Language Navigation (VLN)과 Object Goal Navigation (ObjNav)을 분리된 문제로 다루며, 이들 사이의 연계를 위해 과도한 Cross-modal 학습이나 대규모 VLM 모델에 의존하고 있다는 점을 문제로 지적한다 .

#Review #Embodied Navigation #Platonic Representation Hypothesis #Topological Map #Blind Matching #Zero-shot Navigation #Cross-modal Alignment

2026년 6월 2일

[논문리뷰] PaddleOCR-VL-1.6: Expanding the Frontier of Document Parsing with Under-Optimized Region Refinement and Progressive Post-Training

본 연구는 고성능 0.9B 파라미터 모델인 PaddleOCR-VL-1.5의 잔여 오류를 해결하여 성능을 극대화하고자 합니다 . 저자들은 단순히 훈련 데이터를 늘리는 것만으로는 긴 꼬리(long-tail) 분포의 문서 레이아웃, 복잡한 테이블, 희귀 스크립트 등에서 발생하는 오류를 근본적으로 해결할 수 없음을 관찰했습니다.

#Review #Document Parsing #Vision-Language Model #Under-Optimized Region #Progressive Post-Training #Data Engine #GRPO

2026년 6월 2일

[논문리뷰] OCC-RAG: Optimal Cognitive Core for Faithful Question Answering

본 논문은 범용 LLM이 파라미터 내 방대한 지식에 의존하여 주어진 Context를 무시하거나 할루시네이션(Hallucination)을 생성하는 문제를 해결하고자 합니다.

#Review #Small Language Models #Context Question Answering #Multi-hop Reasoning #Faithfulness #Mid-training #Synthetic Data #Abstention

2026년 6월 2일