Review

[논문리뷰] CoVEBench: Can Video Editing Models Handle Complex Instructions?

본 논문은 기존 비디오 편집 벤치마크들이 단순하고 고립된 편집 작업에만 초점을 맞추어, 실제 사용자의 복잡한 편집 요구사항을 반영하지 못하는 한계를 해결하고자 합니다 .

#Review #Compositional Video Editing #Instruction-guided Editing #Benchmark #Instruction Compliance #Video Fidelity #MLLM-based Evaluation #Fine-grained Diagnostics

2026년 6월 8일

[논문리뷰] Chiaroscuro Attention: Spending Compute in the Dark

본 연구는 표준 Transformer가 모든 토큰에 대해 일관되게 고비용의 O(n²d) self-attention을 적용하는 비효율성을 해결하고자 합니다.

#Review #CHIAR-Former #Spectral Entropy #DCT(Discrete Cosine Transform)#Routing Collapse #Operator Routing #Transformer Efficiency

2026년 6월 8일

[논문리뷰] CIPER: A Unified Framework for Cross-view Image-retrieval and Pose-estimation

본 논문은 기존의 Cross-view geo-localization 접근 방식인 이미지 검색(Image Retrieval)과 포즈 추정(Pose Estimation)이 별도의 파이프라인으로 운용되어 발생하는 비효율성을 해결하고자 합니다 .

#Review #Cross-view Geo-localization #Image Retrieval #Pose Estimation #Transformer #Multi-task Learning #Bidirectional Cross-attention

2026년 6월 8일

[논문리뷰] Bayesian-Agent: Posterior-Guided Skill Evolution for LLM Agent Harnesses

본 논문은 기존의 heuristic한 방식이나 단순한 성공/실패 횟수에 의존하는 Agent Skill 업데이트가 비효율적이며, noisy한 편집으로 인해 오히려 성능 저하를 초래할 수 있다는 문제를 해결하고자 한다.

#Review #LLM Agent #Bayesian Evidence #Skill Evolution #SOP #Harness Engineering #Posterior-Guided Optimization

2026년 6월 8일

[논문리뷰] Answer Presence Drives RAG Rewriting Gains

본 논문은 RAG 파이프라인에서 Rewriter 도입으로 얻는 성능 향상이 실제 정답 문자열 노출에 의한 것인지, 혹은 증거 문서의 질적 개선(Curation)에 의한 것인지 규명하고자 합니다.

#Review #Retrieval-Augmented Generation (RAG)#LLM Rewriting #Causal Intervention #Answer-string Surfacing #Sentinel-Fragility #Audit Protocol

2026년 6월 8일

[논문리뷰] AHA-WAM:Asynchronous Horizon-Adaptive World-Action Modeling with Observation-Guided Context Routing

본 논문은 기존 World-Action Model(WAM)이 월드 모델링과 액션 실행을 동일한 시간 해상도로 강제 결합함으로써 발생하는 구조적 비효율 문제를 해결하고자 합니다 .

#Review #Robot Learning #Embodied Manipulation #World-Action Model #Diffusion Transformer #Asynchronous Inference #Horizon-Adaptive #Observation-Guided Context Routing

2026년 6월 8일

[논문리뷰] A Geometric Account of Activation Steering through Angle-Norm Decomposition

기존의 Additive Steering은 단순히 특정 방향의 벡터를 더하는 방식으로, 이는 개념 제어(Angular)와 hidden state의 크기 변화(Radial)를 동시에 발생시켜 제어의 기하학적 의미를 모호하게 만듭니다 .

#Review #Activation Steering #Angle-Norm Decomposition #Representation Engineering #LLM Geometry #Spherical Steering

2026년 6월 8일

[논문리뷰] dots.tts Technical Report

본 논문은 기존의 이산적(Discrete) 토큰 기반 TTS 모델이 가진 표현력의 한계를 극복하고, 연속적인(Continuous) latent 공간에서 안정적인 AR 음성 생성을 구현하고자 합니다.

#Review #Text-to-Speech #Continuous Latent #Flow-Matching #Autoregressive #AudioVAE #Self-Correction #MeanFlow Distillation

2026년 6월 7일

[논문리뷰] Your UnEmbedding Matrix is Secretly a Feature Lens for Text Embeddings

본 논문은 LLM이 우수한 zero-shot 능력을 갖추고 있음에도 불구하고, 범용 text embedding 모델로 활용될 때는 성능이 저하되는 원인을 분석하고 해결하고자 한다.

#Review #Large Language Model #Text Embedding #Mechanistic Interpretability #Unembedding Matrix #Dimensionality Reduction #Logit Lens #Edge Spectrum

2026년 6월 7일

[논문리뷰] WorldBench: A Challenging and Visually Diverse Multimodal Reasoning Benchmark

본 논문은 기존 멀티모달 벤치마크들이 모델의 실제 추론 능력을 충분히 측정하지 못하는 한계점을 극복하기 위해 WorldBench를 제안한다. 많은 기존 벤치마크가 특정 도메인에 편향되어 있거나 시각적 다양성이 부족하여, VLM의 실제 문제 해결 능력을 과대평가하게 만드는 경향이 있다.

#Review #Multimodal Reasoning #Benchmark #Vision-Language Model #Visual Diversity #Inference #Evaluation #LLM

2026년 6월 7일

[논문리뷰] When Tools Fail: Benchmarking Dynamic Replanning and Anomaly Recovery in LLM Agents

본 논문은 기존의 LLM 에이전트 벤치마크들이 이상적인 'Happy Path' 환경만을 상정하여, 현실의 불안정한 도구 실행 및 오류 상황을 제대로 평가하지 못하는 한계를 지적한다.

#Review #LLM Agents #Tool-Integrated Reasoning #Fault-Tolerance #Dynamic Replanning #Anomaly Recovery #Benchmark #DAG-based Task Generation

2026년 6월 7일

[논문리뷰] When Gradients Collide: Failure Modes of Multi-Objective Prompt Optimization for LLM Judges

본 논문은 여러 평가 기준을 동시에 고려해야 하는 Multi-Objective LLM Judge의 프롬프트 최적화 과정에서 발생하는 근본적인 문제들을 규명한다.

#Review #LLM-as-a-Judge #Prompt Optimization #Textual Gradient #Multi-Objective Optimization #Gradient Dilution #Instruction Interference

2026년 6월 7일

[논문리뷰] Watch, Remember, Reason: Human-View Video Understanding with MLLMs

본 연구는 짧은 클립 위주의 비디오 이해에서 벗어나 분 단위 이상의 장기적이고 다중 모달이 얽힌 복잡한 비디오 환경으로 변화하는 트렌드를 다룹니다.

#Review #Multimodal Large Language Models #Video Understanding #Temporal Grounding #Memory Modeling #Long-video Reasoning #Efficient Perception

2026년 6월 7일

[논문리뷰] UniSHARP: Universal Sharp Monocular View Synthesis

기존의 monocular novel view synthesis 연구(예: SHARP, Flash3D)는 주로 pinhole 카메라 기반의 perspective 이미지에 최적화되어 있어, 시야각이 넓거나 왜곡이 큰 wide-FoV, fisheye, panoramic 카메라 환경으로의 일반화가 어렵습니다 .

#Review #Novel View Synthesis #3D Gaussian Splatting #Monocular Rendering #Omnidirectional Latent Space #Ray-Based Representation #Universal Camera Model

2026년 6월 7일

[논문리뷰] Towards Retrieving Interaction Spaces for Agentic Search

본 논문은 기존의 Agentic Search 방식이 가진 확장성 및 효율성 문제를 해결하기 위해 제안되었습니다.

#Review #Agentic Search #Retrieval-Augmented Generation #Direct Corpus Interaction #Interaction Space #Information Retrieval #LLM

2026년 6월 7일

[논문리뷰] Towards Human-Like Interactive Speech Recognition With Agentic Correction and Semantic Evaluation

본 논문은 현대의 ASR 시스템이 단일 패스 방식에 고착되어 있어, 인간의 의사소통처럼 반복적인 확인과 수정이 필요한 상황에서 의미론적 오류(Meaning-critical errors)를 효과적으로 해결하지 못하는 문제를 해결합니다 .

#Review #Interactive ASR #Agentic Correction #Semantic Evaluation #S2ER #Human-AI Alignment #LLM-as-a-Judge

2026년 6월 7일

[논문리뷰] Thinking with Imagination: Agentic Visual Spatial Reasoning with World Simulators

기존의 VLM들은 관측된 이미지에 제한되어 있어 보이지 않는 레이아웃을 추론하거나 시점 변화에 따른 공간적 일관성을 유지하는 데 한계를 보입니다. 특히 제한적인 일인칭 관측 환경에서는 alternative viewpoint에서 장면을 파악해야 할 필요성이 크지만, 현 모델들은 이를 능동적으로 해결하지 못합니다.

#Review #Vision-Language Models #Spatial Reasoning #World Simulator #Reinforcement Learning #View Consistency #Agentic Reasoning

2026년 6월 7일

[논문리뷰] SubtleMemory: A Benchmark for Fine-Grained Relational Memory Discrimination in Long-Horizon AI Agents

본 연구는 장기 기억을 가진 AI 에이전트가 축적된 기억들 간의 복잡한 관계를 정확히 이해하고 활용하지 못하는 근본적인 문제를 해결하고자 합니다.

#Review #Long-Horizon AI Agents #Long-term Memory #Relational Memory #Benchmarking #LLM Agents #Knowledge Discrimination

2026년 6월 7일

[논문리뷰] Streaming Video Generation with Streaming Force Control

본 논문은 기존 비디오 생성 모델들이 가진 상호작용성(Interactivity)의 결여와 물리적 제어의 한계를 해결하기 위해 StreamForce를 제안합니다.

#Review #Streaming Video Generation #Force Control #Causal Autoregressive Model #Force-aware Distillation #Unified Force Representation

2026년 6월 7일

[논문리뷰] Stream3D-VLM: Online 3D Spatial Understanding with Incremental Geometry Priors

본 논문은 기존 3D LMM들이 전체 장면 관측이나 미리 정의된 비디오 클립을 요구하는 오프라인 방식으로 운영되어 실시간 환경 적용에 한계가 있다는 문제점을 해결하고자 합니다 . 이러한 방식은 자율 로봇이나 AR/VR 기기와 같이 실시간 상호작용이 필수적인 임베디드 애플리케이션에서 사용하기 어렵습니다.

#Review #3D Large Multimodal Models #Online Spatial Understanding #Incremental Geometry Priors #Visual-Spatial Feature Integration #Geometry-Adaptive Voxel Compression #Streaming Video

2026년 6월 7일