Review

[논문리뷰] WorldAgents: Can Foundation Image Models be Agents for 3D World Models?

최근 2D Foundation Models는 Text-to-Image Diffusion을 통해 탁월한 High-fidelity 이미지 생성 능력과 깊은 Semantic Understanding을 보여주었습니다.

#Review #3D World Generation #Foundation Models #Multi-Agent System #Vision-Language Models #3D Consistency #Gaussian Splatting

2026년 3월 22일

[논문리뷰] Versatile Editing of Video Content, Actions, and Dynamics without Training

최근 generative video models의 발전에도 불구하고, 실제 비디오에서 액션이나 dynamic event를 편집하거나, 삽입된 content가 다른 객체의 행동에 영향을 미치도록 하는 non-rigid, dynamic manipulation은 여전히 큰 도전 과제입니다.

#Review #Video Editing #Training-Free #Inversion-Free #Rectified Flow Models #Similarity Guided Aggregation (SGA)#Annealed Noise Correlation (ANC)#Text-to-Video Flow Models #Dynamic Manipulation

2026년 3월 22일

[논문리뷰] TerraScope: Pixel-Grounded Visual Reasoning for Earth Observation

Earth Observation (EO) 분야에서 Vision-Language Models (VLMs)의 가능성에도 불구하고, 기존 VLM들은 픽셀 수준의 정밀한 공간 추론 과 다중 센서 및 시간 경과 데이터 통합 에 어려움을 겪고 있습니다.

#Review #Vision-Language Models (VLMs)#Earth Observation (EO)#Pixel-Grounded Reasoning #Chain-of-Thought (CoT)#Multi-Modal Reasoning #Multi-Temporal Reasoning #Geospatial Reasoning

2026년 3월 22일

[논문리뷰] TAPESTRY: From Geometry to Appearance via Consistent Turntable Videos

Untextured 3D 모델에 대해 사진처럼 사실적이고 자체 일관성(self-consistent) 있는 외관을 자동으로 생성하는 것은 디지털 콘텐츠 제작 분야에서 중요한 도전 과제입니다.

#Review #Video Generation #3D Texturing #Geometric Consistency #Turntable Video #Diffusion Models #Neural Rendering

2026년 3월 22일

[논문리뷰] ProactiveBench: Benchmarking Proactiveness in Multimodal Large Language Models

본 논문은 MLLM이 어려운 시각 태스크에서 사용자에게 단순한 도움을 먼저 요청할 수 있는 'Proactiveness'를 갖췄는지 평가하기 위해 7개 데이터셋을 재구성한 ProactiveBench를 제안하고, 22개 MLLM을 분석합니다.

#Review #MLLM #Benchmark #Proactiveness #Reinforcement Learning #Multimodal Reasoning #Human-AI Interaction

2026년 3월 22일

[논문리뷰] LumosX: Relate Any Identities with Their Attributes for Personalized Video Generation

최근 Diffusion Model의 발전으로 Text-to-Video 생성 능력이 크게 향상되어, 전경(foreground)과 배경(background) 요소에 대한 fine-grained control을 통해 개인화된 콘텐츠 생성이 가능해졌습니다.

#Review #Personalized Video Generation #Multi-Subject #Face-Attribute Alignment #Diffusion Models #Attention Mechanisms #Relational Embedding #Text-to-Video

2026년 3월 22일

[논문리뷰] LoopRPT: Reinforcement Pre-Training for Looped Language Models

최신 Large Language Models (LLMs)는 CoT(Chain-of-Thought) 프롬프팅과 같이 명시적인 텍스트 생성(explicit text generation)을 통해 '생각'하는 방식으로 훈련됩니다.

2026년 3월 22일

[논문리뷰] Language on Demand, Knowledge at Core: Composing LLMs with Encoder-Decoder Translation Models for Extensible Multilinguality

최근 Large Language Models (LLMs)은 뛰어난 일반 지능과 추론 능력을 보여주지만, 다국어 성능에서는 심각한 불균형을 보입니다.

#Review #LLMs #Multilinguality #Encoder-Decoder #Optimal Transport #Cross-Model Mapping #Language-on-Demand #NMT

2026년 3월 22일

[논문리뷰] Hyperagents

기존 Self-improving AI 시스템은 대부분 고정된 Meta agent에 의존하여 Self-improvement 메커니즘의 개선에 근본적인 한계가 있었습니다.

#Review #Hyperagents #Metacognitive Self-modification #Self-improving AI #Open-ended Exploration #Darwin Gödel Machine #Meta-learning #Robotics Reward Design #Olympiad-level Math Grading

2026년 3월 22일

[논문리뷰] How Well Does Generative Recommendation Generalize?

Generative Recommendation (GR) 모델은 기존 Item ID-based 모델 대비 우수한 성능을 보이며 sequential recommendation 분야에서 유망한 패러다임으로 부상했다.

2026년 3월 22일

[논문리뷰] HopChain: Multi-Hop Data Synthesis for Generalizable Vision-Language Reasoning

Vision-language Models (VLMs)는 fine-grained하고 multi-step의 복잡한 시각-언어 추론 Task에서 여전히 어려움을 겪고 있다.

#Review #Vision-Language Models #Multi-Hop Reasoning #Data Synthesis #Reinforcement Learning with Verifiable Rewards #Chain-of-Thought #Generalizable Reasoning #Perception-level Hops #Instance-chain Hops

2026년 3월 22일

[논문리뷰] HiMu: Hierarchical Multimodal Frame Selection for Long Video Question Answering

Long-form video question answering (VideoQA)은 확장된 시간적 맥락에 대한 추론을 요구하지만, 현재 <strong>Large Vision-Language Models (LVLMs)</strong>의 finite context windows는 전체 비디오를 원시 프레임 속도로 처리하는 것을 불가능하게 만든다.

#Review #Video Question Answering #Frame Selection #Neuro-Symbolic Reasoning #Multimodal Understanding #Long Video

2026년 3월 22일

[논문리뷰] FlowScene: Style-Consistent Indoor Scene Generation with Multimodal Graph Rectified Flow

본 논문은 객체 단위의 정밀한 제어와 장면 전반의 스타일 일관성을 동시에 달성하기 어려웠던 기존 indoor scene 생성의 한계를 극복하기 위해, layout·shape·texture를 단일 rectified flow로 공동 생성하는 tri-branch 프레임워크 FlowScene을 제안합니다.

#Review #Scene Generation #Rectified Flow #Multimodal Graph #3D Indoor Synthesis #Style Consistency #Generative Models

2026년 3월 22일

[논문리뷰] EgoForge: Goal-Directed Egocentric World Simulator

Generative world models는 dynamic environment를 simulate하고 reason하는 데 중요한 발전을 보였지만, egocentric vision에서는 rapid viewpoint changes, frequent hand-object interactions, 그리고 latent human intent에 의존하는 complex goal-directed behavior로 인해 어려움을 겪습니다.

2026년 3월 22일

[논문리뷰] Deep Tabular Research via Continual Experience-Driven Execution

Large language models (LLMs)는 구조화된 데이터에 대한 reasoning에서 상당한 능력을 보여주었지만, hierarchical 및 bidirectional header , merged cell , non-canonical layout 을 포함하는 unstructured table에 대한 complex long-horizon analytical task 에서는 어려움을 겪습니다.

#Review #Deep Tabular Research #LLM Agents #Tabular Reasoning #Continual Learning #Experience-Driven Execution #Multi-hop Reasoning #Unstructured Tables

2026년 3월 22일

[논문리뷰] CurveStream: Boosting Streaming Video Understanding in MLLMs via Curvature-Aware Hierarchical Visual Memory Management

Multimodal Large Language Models (MLLMs)는 오프라인 비디오 이해에서 뛰어난 성능을 보였으나, 스트리밍 비디오 시나리오에서는 본질적인 병목 현상에 직면한다.

#Review #Streaming Video Understanding #MLLMs #Memory Management #Curvature Score #Hierarchical Visual Memory #Catastrophic Forgetting

2026년 3월 22일

[논문리뷰] Cooperation and Exploitation in LLM Policy Synthesis for Sequential Social Dilemmas

기존의 다중 에이전트 강화 학습(MARL)은 Sequential Social Dilemmas (SSDs) 환경에서 credit assignment 의 어려움, non-stationarity , 그리고 방대한 joint action space 문제로 인해 효과적인 정책 학습에 한계를 보입니다.

#Review #LLM Policy Synthesis #Sequential Social Dilemmas (SSDs)#Feedback Engineering #Multi-agent Environments #Cooperation #Reward Hacking #Programmatic Policies

2026년 3월 22일

[논문리뷰] Beyond Single Tokens: Distilling Discrete Diffusion Models via Discrete MMD

Discrete diffusion models는 고품질 데이터를 생성할 수 있지만, 일반적으로 샘플링에 많은 반복(sampling steps) 이 필요하며 이는 높은 계산 비용 과 FLOPs 로 이어진다는 문제점이 있습니다.

#Review #Discrete Diffusion Models #Distillation #Moment Matching Distillation #D-MMD #GPT-2 Gradient Moment #Few-step Generators #CIFAR-10 #Open Web Text

2026년 3월 22일

[논문리뷰] BEAVER: A Training-Free Hierarchical Prompt Compression Method via Structure-Aware Page Selection

최근 LLMs의 context window가 기하급수적으로 확장되면서 long-document understanding의 잠재력이 커졌지만, 이는 심각한 inference latency와 정보 utilization 병목 현상을 야기했습니다.

#Review #Prompt Compression #Long-Context LLMs #Training-Free #Hierarchical Selection #Structure-Aware #Inference Latency #Information Utilization

2026년 3월 22일

[논문리뷰] Astrolabe: Steering Forward-Process Reinforcement Learning for Distilled Autoregressive Video Models

Distilled autoregressive (AR) video models는 efficient streaming generation을 가능하게 하지만, 종종 human visual preferences와 misalign되어 artifacts나 unnatural motion dynamics를 보입니다.

#Review #Video Generation #Distilled Autoregressive Models #Reinforcement Learning (RL)#Human Preferences #Streaming Generation #Forward-Process RL #Reward Hacking #Temporal Consistency

2026년 3월 22일