Review

[논문리뷰] Video Generation with Predictive Latents

본 논문은 기존 Video VAE가 단순히 비디오의 시각적 재구성 성능을 최적화하는 것만으로는 우수한 비디오 생성(Generative Performance)을 보장할 수 없다는 문제점을 해결하고자 한다.

#Review #Video Generation #Video VAE #Predictive Learning #Latent Diffusion Models #Temporal Dynamics #Motion Prior #Spatiotemporal Compression

2026년 5월 5일

[논문리뷰] The TTS-STT Flywheel: Synthetic Entity-Dense Audio Closes the Indic ASR Gap Where Commercial and Open-Source Systems Fail

본 논문은 상용 및 오픈 소스 STT 시스템이 인도 언어의 특정 엔티티 인식에서 극도로 낮은 성능을 보이는 문제를 해결하고자 한다. 기존 시스템들은 Wikipedia나 뉴스 등 read-prose 중심의 데이터로 학습되어, 실제 현업에서 빈번한 엔티티 데이터에 취약하다.

#Review #Indic ASR #TTS-STT Flywheel #Entity-Dense Audio #LoRA #Script Fidelity Rate #Data Augmentation #Entity-Hit-Rate

2026년 5월 5일

[논문리뷰] TCDA: Thread-Constrained Discourse-Aware Modeling for Conversational Sentiment Quadruple Analysis

본 논문은 DiaASQ 작업에서 기존 모델들이 대화의 복잡한 의존 관계를 제대로 모델링하지 못하고 발생하는 구조적 노이즈와 거리 감쇠 문제를 해결하고자 합니다. 기존 GCN 기반 연구들은 불필요한 스레드 간 정보를 여과 없이 전파하여 구조적 노이즈를 야기하는 한계가 있습니다.

#Review #DiaASQ #TC-DAG #D-RoPE #Distance Dilution #Sentiment Analysis #Conversational AI #Discourse Modeling

2026년 5월 5일

[논문리뷰] SymptomAI: Towards a Conversational AI Agent for Everyday Symptom Assessment

본 연구는 실제 일상생활 속에서 사용자가 호소하는 증상을 기반으로 하는 대화형 AI 진단 에이전트의 성능을 임상적 수준에서 검증하고자 한다.

#Review #Conversational AI #Differential Diagnosis (DDx)#LLM #Fitbit #Wearable Biosignals #PheWAS #Healthcare AI

2026년 5월 5일

[논문리뷰] StateSMix: Online Lossless Compression via Mamba State Space Models and Sparse N-gram Context Mixing

본 논문은 대규모 LLM 기반 압축 기술이 요구하는 엄청난 컴퓨팅 자원과 외부 가중치 전송의 비실용성을 해결하기 위해 완전 online 신경망 압축 방식을 제안한다. 기존의 고성능 신경망 압축 모델들은 수억 개의 파라미터를 외부에서 가져와야 하므로 범용적인 환경에서 사용하기 어렵다.

#Review #Lossless Compression #State Space Models #Mamba #Online Learning #Arithmetic Coding #N-gram #BPE Tokenisation

2026년 5월 5일

[논문리뷰] Skills-Coach: A Self-Evolving Skill Optimizer via Training-Free GRPO

본 연구는 LLM 기반 Agent 생태계에서 Skill이 범람함에도 불구하고, 개별 개발자가 특정 목적 위주로 설계하여 기능적 파편화(Fragmentation)와 커버리지 부족 문제를 겪고 있는 현실을 해결하고자 합니다 .

#Review #Large Language Model #Agent #Skill Self-Evolution #GRPO #Benchmark #Automation

2026년 5월 5일

[논문리뷰] SVGS: Enhancing Gaussian Splatting Using Primitives with Spatially Varying Colors

본 논문은 기존 Gaussian Splatting 방식이 복잡한 텍스처나 기하학적 형태를 표현할 때 비효율적이라는 문제를 해결하고자 합니다 .

#Review #Gaussian Splatting #Novel-view Synthesis #Spatially Varying #Gaussian Surfels #Movable Kernels #3D Reconstruction

2026년 5월 5일

[논문리뷰] Reinforcement Learning for LLM-based Multi-Agent Systems through Orchestration Traces

본 논문은 LLM 기반의 에이전트가 개별적인 도구 사용을 넘어 조율된 팀 단위로 진화함에 따라, 기존의 단일 에이전트 RL이나 고전적 MARL 방법론이 갖는 한계를 지적한다.

#Review #LLM #Multi-Agent Systems #Reinforcement Learning #Orchestration Trace #Credit Assignment #Reward Design #System Engineering

2026년 5월 5일

[논문리뷰] PatRe: A Full-Stage Office Action and Rebuttal Generation Benchmark for Patent Examination

본 논문은 기존 특허 관련 연구가 특허 심사를 단순한 이진 분류(Acceptance Prediction)나 정적인 정보 추출 문제로만 취급하여 실제 현장의 반복적이고 상호작용적인 심사 과정을 반영하지 못한다는 한계를 해결하고자 한다.

#Review #Patent Examination #Office Action Generation #Rebuttal Generation #Large Language Models #Legal Reasoning #Benchmark

2026년 5월 5일

[논문리뷰] OpenSeeker-v2: Pushing the Limits of Search Agents with Informative and High-Difficulty Trajectories

본 연구는 고성능 search agent 개발이 자본과 컴퓨팅 자원이 막대한 기업 주도의 CPT+SFT+RL 파이프라인에 종속된 현실을 비판적으로 접근합니다. 기존의 복잡한 학습 방식은 학계의 진입 장벽을 높이고 연구 생태계의 폐쇄성을 야기합니다.

#Review #Search Agent #SFT #ReAct #Data Quality #Long-horizon Reasoning #Data Synthesis

2026년 5월 5일

[논문리뷰] HeavySkill: Heavy Thinking as the Inner Skill in Agentic Harness

본 논문은 현대의 복잡한 Agentic Harness 설계 이면에 숨겨진 실질적인 성능 구동 메커니즘을 규명하고 이를 단순화하고자 한다. 기존의 오케스트레이션 설계는 시스템이 매우 복잡하여 실질적인 추론 메커니즘을 파악하기 어렵다는 한계가 있었다.

#Review #Agentic Harness #Heavy Thinking #Large Language Model #Test-Time Scaling #Sequential Deliberation #Parallel Reasoning #RLVR

2026년 5월 5일

[논문리뷰] Healthcare AI GYM for Medical Agents

본 논문은 의료 AI 에이전트가 복잡한 다단계 임상 추론 환경에서 안정적인 툴 사용 정책을 학습하는 데 한계가 있다는 문제를 해결하고자 합니다. 기존의 단일 턴(single-turn) 기반 의료 QA 연구들은 실제 임상 환경의 핵심인 다단계 상호작용과 툴 활용 능력을 충분히 반영하지 못합니다.

#Review #Medical AI Agents #Reinforcement Learning #On-Policy Distillation #Clinical Reasoning #Multi-turn Interaction #Healthcare AI GYM

2026년 5월 5일

[논문리뷰] ESARBench: A Benchmark for Agentic UAV Embodied Search and Rescue

본 논문은 기존의 UAV SAR 연구들이 전통적인 비전 및 경로 계획 방식에 국한되어 있어, 복잡한 환경에서의 자율적 의사결정 능력을 평가할 통합된 벤치마크가 부족하다는 점을 지적합니다.

#Review #Embodied AI #Search and Rescue (SAR)#UAV #Multimodal Large Language Models (MLLMs)#Simulation Platform #Benchmark

2026년 5월 5일

[논문리뷰] Chain of Evidence: Pixel-Level Visual Attribution for Iterative Retrieval-Augmented Generation

본 논문은 기존의 text-based iRAG 시스템이 겪는 Coarse-grained attribution과 Visual semantic loss 문제를 해결하기 위해 고안되었습니다.

#Review #Iterative Retrieval-Augmented Generation #Visual Attribution #Vision-Language Models #Pixel-level Grounding #Multi-hop Reasoning

2026년 5월 5일

[논문리뷰] Beyond SFT-to-RL: Pre-alignment via Black-Box On-Policy Distillation for Multimodal RL

본 논문은 LMM의 표준 post-training 파이프라인인 SFT→RLVR에서 발생하는 distributional drift 문제를 해결하고자 한다. 기존의 SFT는 토큰 수준의 uniform objective에 의존하여 모델이 피상적인 패턴만을 학습하게 만들며, 이는 모델의 본래 성능을 왜곡하는 결과를 초래한다.

#Review #Multimodal LLM #Reinforcement Learning #On-Policy Distillation #Distributional Drift #Mixture-of-Experts (MoE)#Adversarial Alignment

2026년 5월 5일

[논문리뷰] A Benchmark for Interactive World Models with a Unified Action Generation Framework

본 논문은 대규모 데이터셋과 통합된 벤치마크의 부재로 인해 interactive world model의 물리적 상호작용 능력을 객관적으로 평가하기 어렵다는 문제를 해결하고자 합니다.

#Review #Interactive World Models #Benchmark #Action Generation Framework #Embodied Intelligence #Trajectory Following #Memory Ability

2026년 5월 5일

[논문리뷰] T^2PO: Uncertainty-Guided Exploration Control for Stable Multi-Turn Agentic Reinforcement Learning

본 논문은 다회차 Agentic RL 환경에서 빈번하게 발생하는 Training Collapse 현상을 해결하고자 합니다.

#Review #Agentic Reinforcement Learning #Multi-Turn Reasoning #Uncertainty-Guided Exploration #Token-Level Thinking Intervention #Turn-Level Dynamical Sampling #Training Stability

2026년 5월 4일

[논문리뷰] Repetition over Diversity: High-Signal Data Filtering for Sample-Efficient German Language Modeling

본 논문은 데이터가 제한된 고자원 비영어권 언어(독일어 등)의 LLM 학습에서 발생하는 '데이터 다양성 확보'와 '데이터 품질 강화' 사이의 전략적 딜레마를 해결하고자 한다.

#Review #Large Language Models #Data Filtering #Sample Efficiency #German Language Modeling #Multi-Epoch Training #Semantic Density #High-Signal Data

2026년 5월 4일

[논문리뷰] PhysicianBench: Evaluating LLM Agents in Real-World EHR Environments

본 논문은 기존 의료용 AI 벤치마크들이 정적 지식 회상이나 단일 단계 작업에 국한되어, 실제 의료 현장에서 요구되는 복합적이고 긴 호흡의 임상 워크플로우를 평가하지 못하는 한계를 해결하고자 한다.

#Review #LLM Agents #EHR #Benchmark #FHIR #Clinical Workflows #Agentic Evaluation #Long-horizon Tasks

2026년 5월 4일

[논문리뷰] Persistent Visual Memory: Sustaining Perception for Deep Generation in LVLMs

본 논문은 Autoregressive LVLM이 긴 문맥 생성 시 겪는 Visual Signal Dilution 문제를 해결하고자 한다.

#Review #Large Vision-Language Models #Visual Signal Dilution #Persistent Visual Memory #Autoregressive Generation #Multimodal Reasoning #Bottleneck Adapter

2026년 5월 4일