Review

[논문리뷰] Hide-and-Seek in Trajectories: Discovering Failure Signals for VLA Runtime Monitoring

본 논문은 범용 VLA 모델이 실환경 배포 시 겪는 실행 실패 문제를 실시간으로 감지하기 위한 효율적인 방법을 모색합니다. 기존의 방법론들은 고가의 단계별 실패 주석이 필요하거나, 액션 재샘플링 및 외부 VLM 모델 사용에 따른 높은 계산 오버헤드로 인해 실시간 배포가 어렵다는 한계가 있습니다.

#Review #Vision-Language-Action (VLA)#Failure Detection #Coarsely Supervised Learning #Contrastive Learning #Conformal Prediction #Embodied AI

2026년 5월 31일

[논문리뷰] GrepSeek: Training Search Agents for Direct Corpus Interaction

본 논문은 기존의 retrieval-augmented agentic search 시스템이 pre-computed index와 retriever에 의존함으로써 발생하는 한계를 해결하고자 합니다.

#Review #Direct Corpus Interaction #Search Agent #Reinforcement Learning #Sharded-Parallel Execution #Information Retrieval #Agentic Search

2026년 5월 31일

[논문리뷰] GGT-100K: Generative Ground Truth for Generalizable Real-World Image Restoration

실세계 이미지 복원(IR) 모델은 학습 데이터 부족으로 인해 실제 환경에서의 일반화 성능이 현저히 떨어지는 고질적인 병목 현상을 겪고 있습니다. 합성 데이터는 실세계의 복잡한 열화(degradation) 과정을 제대로 모델링하지 못하며, 실제 촬영된 데이터는 비용과 확장성 및 장면 다양성 확보에 한계가 있습니다.

#Review #Image Restoration #Generative Ground Truth #Multimodal Foundation Models #Generalization #Dataset Construction #Quality Control

2026년 5월 31일

[논문리뷰] GDSD: Reinforcement Learning as Guided Denoiser Self-Distillation for Diffusion Language Models

dLLMs는 기존의 Autoregressive Models(ARMs) 대비 효율적인 생성 성능을 제공하지만, 최적의 성능을 위해 필요한 강화학습(RL) 적용 시 정책 likelihood가 계산 불가능하다는 핵심적인 난관에 직면합니다.

#Review #Diffusion Language Models #Reinforcement Learning #Self-Distillation #Training-Inference Mismatch #Logit Matching

2026년 5월 31일

[논문리뷰] Function2Scene: 3D Indoor Scene Layout from Functional Specifications

기존의 텍스트 기반 3D 실내 레이아웃 생성 모델들은 주로 가구 목록을 배치하는 '객체 중심(object-centric)' 접근 방식을 취하고 있어, 실제 실내 디자인의 핵심인 인간의 활동과 기능을 충분히 지원하지 못한다는 한계가 있습니다 .

#Review #3D Indoor Scene Synthesis #Functional Specification #Constraint Taxonomy #Iterative Refinement #Agentic Pipeline #Human-Centered Design

2026년 5월 31일

[논문리뷰] From Prompt Injection to Persistent Control: Defending Agentic Harness Against Trojan Backdoors

본 논문은 Agentic Harness 환경에서 발생하는 다단계 Trojan 공격이 기존의 단일 턴 기반 방어 체계를 무력화하는 심각한 보안 위협임을 지적한다.

#Review #Agentic Harness #Multi-step Trojan Attack #Prompt Injection #DASGuard #ClawTrojan #Workspace Security

2026년 5월 31일

[논문리뷰] Flat-Pack Bench: Evaluating Spatio-Temporal Understanding in Large Vision-Language Models through Furniture Assembly

본 논문은 현대의 LVLM이 일상적인 비디오 이해와 조작 과제를 해결하기 위한 세밀한 시공간적 추론 능력이 부족하다는 문제에서 시작한다.

#Review #Large Vision-Language Models #Video Understanding #Spatio-Temporal Reasoning #Furniture Assembly #Object Tracking #Contact Reasoning

2026년 5월 31일

[논문리뷰] FRAPPE: Full Input, Residual Output Autoencoding with Projection Pursuit Encoder

본 연구는 로봇, 웨어러블 기기 등 자원이 제한된 환경에서 클라우드 기반의 AI 인식을 원활하게 수행하기 위한 실시간 영상 압축 기술의 한계를 해결하고자 합니다.

#Review #Compression #Autoencoder #Projection Pursuit #Asymmetric Codec #Real-time #Resource-constrained #Variable-rate

2026년 5월 31일

[논문리뷰] Exploring Autonomous Agentic Data Engineering for Model Specialization

본 논문은 LLM이 인간의 설계 없이 데이터 엔지니어링 파이프라인을 자율적으로 수행하여 모델 특화(Model Specialization)를 달성할 수 있는지에 대한 근본적인 의문을 해결하고자 한다 .

#Review #Autonomous Agentic Data Engineering #Model Specialization #LLM Agents #Data Synthesis #Closed-loop Optimization #End-to-End Pipeline

2026년 5월 31일

[논문리뷰] Emergent Languages in Populations of Language Model Agents: From Token Efficiency to Oversight Evasion

본 논문은 자율적 LLM 에이전트들이 인간의 감시를 회피하기 위해 독자적인 언어를 발명하고 사용하는 현상을 규명하고자 한다.

#Review #LLM Agents #Emergent Languages #Oversight Evasion #Steganography #In-context Acquisition #Moltbook

2026년 5월 31일

[논문리뷰] DecMem: Towards Minute-Long Consistent World Generation with Decoupled Memory

본 논문은 장기 비디오 생성 시 발생하는 시간적 일관성 부족과 계산 효율성 문제를 해결하기 위해 DecMem을 제안한다.

#Review #World Model #Video Generation #Long-horizon Extrapolation #Memory Architecture #Sparse Retrieval #Attention Dispersion

2026년 5월 31일

[논문리뷰] DRIFT: Decoupled Rollouts and Importance-Weighted Fine-Tuning for Efficient Multi-Turn Optimization

본 연구는 다중 턴 상호작용 환경에서 LLM을 효율적으로 최적화해야 하는 과제를 해결합니다. 기존 online RL 방법론은 다중 턴 역학을 효과적으로 학습할 수 있으나, 업데이트마다 전체 대화 경로를 생성해야 하는 높은 계산 비용(rollout cost)으로 인해 실용성이 낮습니다 .

#Review #Large Language Models #Reinforcement Learning #Supervised Fine-Tuning #Multi-Turn Optimization #Importance Sampling #Distribution Matching

2026년 5월 31일

[논문리뷰] Count Anything

본 연구는 객체 카운팅 분야가 특정 도메인(군중, 차량, 세포 등)에 편향된 데이터셋과 모델로 인해 파편화되어 있다는 점을 핵심 문제로 정의한다. 기존 연구들은 일반화 성능이 낮고, 개별 도메인에 종속된 카운팅 모델은 다양한 스케일과 밀도 분포를 가진 현실 세계의 객체를 효과적으로 처리하지 못한다.

#Review #Object Counting #Generalist Model #Text-guided #Cross-domain #Instance-grounded #Dual-granularity

2026년 5월 31일

[논문리뷰] Comprehensive Benchmarking of Long-Form Speech Generation in Diverse Scenarios

본 논문은 Long-form speech generation 분야의 시스템적 평가가 체계적이지 못하다는 문제를 해결하기 위해 제안되었다. 기존 연구들은 제한된 도메인이나 단일 화자 설정에 머물러 있어, 실제 복잡한 하위 응용 프로그램과의 괴리가 존재한다.

#Review #Long-form Speech Generation #SwanBench-Speech #Speech Synthesis #Evaluation Benchmark #Prosodic Coherence #Acoustic Consistency #Expressive Hierarchy

2026년 5월 31일

[논문리뷰] COLLEAGUE.SKILL: Automated AI Skill Generation via Expert Knowledge Distillation

본 논문은 LLM agent가 단순히 단일 명령을 수행하는 수준을 넘어, 특정 전문가의 판단력과 행동 양식을 신뢰성 있게 재현해야 하는 요구를 해결하고자 합니다. 기존 시스템은 개인의 전문 지식을 파편화된 기억(Memory)이나 불투명한 프롬프트로 저장하여 관리와 수정이 어렵다는 한계가 있습니다 .

#Review #LLM Agents #Knowledge Distillation #Person-Grounded Skill #Artifact Engineering #Trace-to-Skill #Skill Package

2026년 5월 31일

[논문리뷰] Beyond Holistic Models: Systematic Component-level Benchmarking of Deep Multivariate Time-Series Forecasting

본 논문은 기존의 MTSF 연구가 개별 모델을 복잡한 'Holistic Model'로 간주하여 평가함에 따라, 내부 핵심 메커니즘의 개별적인 성능 기여도가 불분명하다는 문제를 제기합니다 .

#Review #Component-level Analysis #Benchmark #Time Series Forecasting #MTSF #AutoML #Zero-shot #Performance Corpus

2026년 5월 31일

[논문리뷰] Benchmarking Composed Image Retrieval for Applied Earth Observation

본 논문은 Earth Observation(EO) 아카이브 탐색 시 사용자의 구체적인 의도를 반영하기 어려운 기존의 단일 모달(이미지 혹은 텍스트) 검색 방식의 한계를 해결하고자 한다.

#Review #Remote Sensing Image Retrieval #Composed Image Retrieval #Multimodal Retrieval #Vision-Language Models #Earth Observation #Benchmarking

2026년 5월 31일

[논문리뷰] AnyMo: Scaling Any-Modality Conditional Motion Generation with Masked Modeling

본 논문은 범용적인 인간 모션 생성(Human Motion Generation) 모델이 직면한 데이터 부족 및 제어 유연성 문제를 해결하는 것을 목표로 한다. 기존 연구들은 특정 모달리티에 국한된 태스크(예: Text-to-Motion)나 고비용의 MoCap 데이터에 의존하여 확장성과 범용성이 떨어진다는 한계가 있다.

#Review #Any-Modality Conditional Motion Generation #Masked Modeling #OmniHuMo #Residual FSQ #Multimodal Motion Synthesis

2026년 5월 31일

[논문리뷰] A Topology-Aware Spatiotemporal Handover Framework for Continuous Multi-UAV Tracking

본 연구는 다수 UAV 기반 교통 관제에서 발생하는 차량 ID 단절(trajectory fragmentation) 문제를 해결하기 위해 수행되었습니다 .

#Review #Multi-UAV Tracking #MCMT #Spatiotemporal Handover #Edge Deployment #Topology-Aware #Identity Persistence

2026년 5월 31일

[논문리뷰] minWM: A Full-Stack Open-Source Framework for Real-Time Interactive Video World Models

본 논문은 기존의 고품질 Video Foundation Model을 실시간 상호작용이 가능한 Interactive World Model로 전환하는 파이프라인의 부재 문제를 해결합니다.

#Review #Video World Models #Diffusion Models #Autoregressive #Distillation #Real-time Inference #Camera Control

2026년 5월 28일