Review

[논문리뷰] OmniCap-IF: Benchmarking and Improving Instruction Following Abilities for Omni-Video Captioning

본 논문은 Omni-modal 모델들이 복잡한 사용자 지시 사항을 준수하는 능력인 Instruction Following에 대한 체계적인 평가 도구가 부족하다는 점을 해결하고자 합니다.

#Review #Omni-modal Large Language Models #Instruction Following #Video Captioning #Temporal Grounding #Constraint Framework #Format-Content Tradeoff

2026년 6월 8일

[논문리뷰] OASIS: From Simulation Data Collection to Real-World Humanoid Loco-Manipulation

본 논문은 휴머노이드 로봇의 Loco-Manipulation 태스크를 위한 대규모의 고품질 Demonstration Data 부족 문제를 해결하고자 합니다.

#Review #Humanoid Loco-Manipulation #Simulation Data Collection #Zero-Shot Transfer #Domain Randomization #Visuomotor Policy #Flow Matching #Unitree G1

2026년 6월 8일

[논문리뷰] Light-WAM: Efficient World Action Models with State-Fusion Action Decoding

본 논문은 대규모 생성적 아키텍처를 기반으로 하는 기존 WAM의 높은 학습 비용과 추론 Latency 문제를 해결하기 위해 Light-WAM을 제안합니다.

#Review #World Action Models #Robot Manipulation #State-Fusion Action Decoding #Efficient Inference #Latent Space Supervision #Video Co-training

2026년 6월 8일

[논문리뷰] Liberating LLM Capabilities in Full-Duplex Speech Models

본 논문은 기존의 음성 기반 LLM이 음성 응답이라는 제한된 출력 채널에 갇혀, 텍스트가 가진 구조적·논리적 강점을 충분히 활용하지 못한다는 점을 지적한다.

#Review #Full-Duplex #Speech LLM #Visible Writing #Tri-channel Paradigm #Token Schema #Real-time Interaction

2026년 6월 8일

[논문리뷰] Lean4Agent: Formal Modeling and Verification for Agent Workflow and Trajectory

본 논문은 LLM 에이전트 워크플로우 및 실행 궤적에 대한 공식적인 모델링, 검증, 디버깅 방법론이 부재한 문제를 해결한다.

#Review #Formal Methods #LLM Agent #Lean4 #Workflow Verification #Trajectory Analysis #FormalAgentLib #LeanEvolve

2026년 6월 8일

[논문리뷰] LatentSkill: From In-Context Textual Skills to In-Weight Latent Skills for LLM Agents

본 논문은 LLM Agent가 외부 기술을 프롬프트에 직접 주입할 때 발생하는 컨텍스트 오버헤드와 보안 노출 문제를 해결하고자 합니다. 기존의 In-Context Skill 방식은 매 단계마다 기술 텍스트를 삽입해야 하므로 추론 비용이 높고, 프롬프트 내에 기술 내용이 그대로 노출되어 공격에 취약하다는 단점이 있습니다.

#Review #LLM Agents #LoRA #Hypernetworks #Skill Composition #Weight Space #Prompt Efficiency #Modular Learning

2026년 6월 8일

[논문리뷰] Latent Spatial Memory for Video World Models

본 논문은 기존 비디오 월드 모델이 가진 3D 공간 일관성 유지의 한계와 과도한 계산 비용 문제를 해결하기 위해 Mirage를 제안한다.

#Review #Video Generation #Spatial Memory #3D-consistent Video Generation #Video World Models #Latent Space #Diffusion Models

2026년 6월 8일

[논문리뷰] Human Psychometric Questionnaires Mischaracterize LLM Behavior

본 논문은 LLM의 가치와 성격을 평가하기 위해 인간용 심리측정 설문지를 사용하는 것이 과연 실제 사용자 상호작용에서의 행동을 신뢰성 있게 예측하는지 의문을 제기합니다.

#Review #LLM #Psychometrics #Value Portrait #Generation Probability #Alignment #Construct Validity

2026년 6월 8일

[논문리뷰] Honest Lying: Understanding Memory Confabulation in Reflexive Agents

본 논문은 Reflexion과 같은 에이전트가 자가 생성한 피드백에 의존할 때 발생하는 'Memory Confabulation' 문제를 해결하고자 합니다. 기존 연구는 에이전트가 스스로 자신의 실패를 정확히 진단할 수 있다고 가정하지만, 저자들은 이 가정이 체계적으로 실패할 수 있음을 입증합니다.

#Review #Reflexive Agents #Memory Confabulation #Reflexion #ALFWorld #LLM Agents #Programmatic Feedback Extraction #Reflection Repetition Rate

2026년 6월 8일

[논문리뷰] Hardening Agent Benchmarks with Adversarial Hacker-Fixer Loops

본 논문은 현대의 Agent Benchmarks가 보유한 outcome verifier의 취약성을 지적하고, 이를 자동으로 강화할 수 있는 체계적인 방법론을 제안한다. 기존의 방식은 새로운 유형의 공격이 발견될 때마다 개발자가 수동으로 검증기를 패치하는 사후 대응적(reactive) 접근에 의존하고 있어 확장이 어렵다 .

#Review #Agentic Evaluation #Reward Hacking #Adversarial Robustness #LLM Benchmarks #Hacker-Fixer Loop #Verifiers #Defense Pool

2026년 6월 8일

[논문리뷰] FlashMemory-DeepSeek-V4: Lightning Index Ultra-Long Context via Lookahead Sparse Attention

본 논문은 초장기 context 처리 시 발생하는 KV 캐시의 메모리 병목 현상을 해결하고자 합니다 . 기존 LLM은 모든 historical context를 GPU 메모리에 상주시켜야 하므로, context 길이가 길어짐에 따라 GPU 메모리 요구량이 선형적으로 증가하는 치명적인 한계가 있습니다.

#Review #Large Language Models #Ultra-Long Context #Sparse Attention #KV Cache Compression #Lookahead Sparse Attention #Neural Memory Indexer #Decoupled Training

2026년 6월 8일

[논문리뷰] Experience Makes Skillful: Enabling Generalizable Medical Agent Reasoning via Self-Evolving Skill Memory

본 논문은 기존 의료용 에이전트(Medical Agent)들이 정적인 지식이나 단기 메모리에 의존하여, 복잡한 임상 상황에서 장기적인 경험을 효과적으로 축적하지 못하는 한계를 해결하고자 합니다.

#Review #Medical Agent #Skill Memory #Self-Evolving #Clinical Reasoning #Value-aware Retrieval #Trajectory-to-Skill Distillation #Non-parametric Reinforcement

2026년 6월 8일

[논문리뷰] Evaluation Cards: An Interpretive Layer for AI Evaluation Reporting

본 논문은 현재 AI 평가 생태계가 파편화되어 있어, 모델의 성능 지표를 신뢰하거나 비교하기 어렵다는 점을 해결하고자 합니다. 기존 연구들은 평가의 특정 측면만을 다루거나 정적인 보고서 형태에 머물러 있어, 실제 평가 파이프라인에서 발생하는 데이터들을 체계적으로 통합하지 못합니다.

#Review #AI Evaluation #Reporting Framework #Reproducibility #Transparency #Interpretive Layer #Benchmark Metadata #Rollout Hierarchy

2026년 6월 8일

[논문리뷰] End-to-End Context Compression at Scale

본 연구는 긴 문맥(long-context) 처리가 LLM의 핵심 역량임에도 불구하고, 기하급수적으로 증가하는 KV Cache 메모리 점유율과 이로 인한 추론 속도 저하 문제를 해결하고자 합니다.

#Review #Context Compression #KV Cache #Latent Context Language Models #Encoder-Decoder #End-to-End Training #Model Efficiency

2026년 6월 8일

[논문리뷰] EmpiriGraph-Psy: A Dataset and LLM Pipeline for Extracting Empirical Relation Graphs from Psychology Abstracts

본 논문은 심리학과 같은 변수 지향적(Variable-oriented) 학문 분야의 과학적 지식을 구조화하기 위해 EmpiriGraph-Psy를 제안합니다.

#Review #Scientific Relation Extraction #Knowledge Graphs #Psychology #LLM Pipeline #Empirical Research #Variable Normalization

2026년 6월 8일

[논문리뷰] Echo-Memory: A Controlled Study of Memory in Action World Models

본 논문은 Action World Models에서 발생하는 근본적인 Memory 실패 문제를 해결하기 위해 연구를 시작했다 . 기존의 연구들은 서로 다른 Backbone, Training recipe, Evaluation protocol을 사용하여 메모리 성능을 정확하게 비교하는 것이 불가능했습니다.

#Review #Action World Models #Video Diffusion #Memory Mechanism #Open-domain Return #Replay Consistency #State-Space Memory #Context Compression

2026년 6월 8일

[논문리뷰] EMMA: Extracting Multiple physical parameters from Multimodal Data

본 연구는 실제 환경에서 작동하는 자율 주행 플랫폼이나 드론과 같은 시스템의 물리적 파라미터를 파편화된 멀티모달 데이터로부터 정교하게 추정하는 문제를 해결합니다.

#Review #Multimodal Data #Physical Parameter Extraction #Liquid Time-Constant Network #Physics-Informed #Digital Twin #Implicit Dynamics #Forced Dynamical Systems

2026년 6월 8일

[논문리뷰] DuMate-DeepResearch: An Auditable Multi-Agent System with Recursive Search and Rubric-Grounded Reasoning

본 논문은 기존의 Deep Research(DR) 시스템들이 직면한 4가지 핵심적인 한계점을 해결하고자 합니다. 첫째, 불충분하게 정의된 연구 범위 속에서 긴 호흡의 계획을 수행할 때 발생하는 복잡성 문제입니다. 둘째, 단일 에이전트 환경에서 하위 작업의 분해 및 스케줄링 과정 중 발생하는 오류 전파의 위험입니다.

#Review #Deep Research #Multi-Agent System #Graph-Based Dynamic Planning #Recursive Execution #Rubric-Grounded Reasoning #Auditability #Test-Time Optimization

2026년 6월 8일

[논문리뷰] DEI: Diversity in Evolutionary Inference for Quality-Diversity Search

본 논문은 기존의 병렬 LLM 기반 탐색이 컴퓨팅 자원의 확장에만 초점을 맞출 뿐, 모델의 인지적 다양성을 간과하고 있다는 문제를 해결하고자 합니다.

#Review #Quality-Diversity Search #Large Language Models #Evolutionary Algorithms #Digital Red Queen #Heterogeneous Ensemble #Distributed Optimization

2026년 6월 8일

[논문리뷰] Cosine Misleads: Auxiliary Losses Reshape Vision Language Models, Not Their Latents

본 논문은 LVR 프레임워크에서 latent와 타깃 간의 정렬 지표인 Cosine 유사도가 모델의 성능을 제대로 반영하지 못하는 '오도(Misleading)' 현상을 해결하고자 한다 .

#Review #Vision-Language Models #Latent Visual Reasoning #Information Bottleneck #Linear Probing #Auxiliary Loss #Faithfulness #Diagnostic

2026년 6월 8일