Review

[논문리뷰] TerraScope: Pixel-Grounded Visual Reasoning for Earth Observation

Earth Observation (EO) 분야에서 Vision-Language Models (VLMs)의 가능성에도 불구하고, 기존 VLM들은 픽셀 수준의 정밀한 공간 추론 과 다중 센서 및 시간 경과 데이터 통합 에 어려움을 겪고 있습니다.

#Review #Vision-Language Models (VLMs)#Earth Observation (EO)#Pixel-Grounded Reasoning #Chain-of-Thought (CoT)#Multi-Modal Reasoning #Multi-Temporal Reasoning #Geospatial Reasoning

2026년 3월 22일

[논문리뷰] TAPESTRY: From Geometry to Appearance via Consistent Turntable Videos

Untextured 3D 모델에 대해 사진처럼 사실적이고 자체 일관성(self-consistent) 있는 외관을 자동으로 생성하는 것은 디지털 콘텐츠 제작 분야에서 중요한 도전 과제입니다.

#Review #Video Generation #3D Texturing #Geometric Consistency #Turntable Video #Diffusion Models #Neural Rendering

2026년 3월 22일

[논문리뷰] ProactiveBench: Benchmarking Proactiveness in Multimodal Large Language Models

본 논문은 MLLM이 어려운 시각 태스크에서 사용자에게 단순한 도움을 먼저 요청할 수 있는 'Proactiveness'를 갖췄는지 평가하기 위해 7개 데이터셋을 재구성한 ProactiveBench를 제안하고, 22개 MLLM을 분석합니다.

#Review #MLLM #Benchmark #Proactiveness #Reinforcement Learning #Multimodal Reasoning #Human-AI Interaction

2026년 3월 22일

[논문리뷰] LumosX: Relate Any Identities with Their Attributes for Personalized Video Generation

최근 Diffusion Model의 발전으로 Text-to-Video 생성 능력이 크게 향상되어, 전경(foreground)과 배경(background) 요소에 대한 fine-grained control을 통해 개인화된 콘텐츠 생성이 가능해졌습니다.

#Review #Personalized Video Generation #Multi-Subject #Face-Attribute Alignment #Diffusion Models #Attention Mechanisms #Relational Embedding #Text-to-Video

2026년 3월 22일

[논문리뷰] LoopRPT: Reinforcement Pre-Training for Looped Language Models

최신 Large Language Models (LLMs)는 CoT(Chain-of-Thought) 프롬프팅과 같이 명시적인 텍스트 생성(explicit text generation)을 통해 '생각'하는 방식으로 훈련됩니다.

2026년 3월 22일

[논문리뷰] Language on Demand, Knowledge at Core: Composing LLMs with Encoder-Decoder Translation Models for Extensible Multilinguality

최근 Large Language Models (LLMs)은 뛰어난 일반 지능과 추론 능력을 보여주지만, 다국어 성능에서는 심각한 불균형을 보입니다.

#Review #LLMs #Multilinguality #Encoder-Decoder #Optimal Transport #Cross-Model Mapping #Language-on-Demand #NMT

2026년 3월 22일

[논문리뷰] Hyperagents

기존 Self-improving AI 시스템은 대부분 고정된 Meta agent에 의존하여 Self-improvement 메커니즘의 개선에 근본적인 한계가 있었습니다.

#Review #Hyperagents #Metacognitive Self-modification #Self-improving AI #Open-ended Exploration #Darwin Gödel Machine #Meta-learning #Robotics Reward Design #Olympiad-level Math Grading

2026년 3월 22일

[논문리뷰] How Well Does Generative Recommendation Generalize?

Generative Recommendation (GR) 모델은 기존 Item ID-based 모델 대비 우수한 성능을 보이며 sequential recommendation 분야에서 유망한 패러다임으로 부상했다.

2026년 3월 22일

[논문리뷰] HopChain: Multi-Hop Data Synthesis for Generalizable Vision-Language Reasoning

Vision-language Models (VLMs)는 fine-grained하고 multi-step의 복잡한 시각-언어 추론 Task에서 여전히 어려움을 겪고 있다.

#Review #Vision-Language Models #Multi-Hop Reasoning #Data Synthesis #Reinforcement Learning with Verifiable Rewards #Chain-of-Thought #Generalizable Reasoning #Perception-level Hops #Instance-chain Hops

2026년 3월 22일

[논문리뷰] HiMu: Hierarchical Multimodal Frame Selection for Long Video Question Answering

Long-form video question answering (VideoQA)은 확장된 시간적 맥락에 대한 추론을 요구하지만, 현재 <strong>Large Vision-Language Models (LVLMs)</strong>의 finite context windows는 전체 비디오를 원시 프레임 속도로 처리하는 것을 불가능하게 만든다.

#Review #Video Question Answering #Frame Selection #Neuro-Symbolic Reasoning #Multimodal Understanding #Long Video

2026년 3월 22일

[논문리뷰] FlowScene: Style-Consistent Indoor Scene Generation with Multimodal Graph Rectified Flow

본 논문은 객체 단위의 정밀한 제어와 장면 전반의 스타일 일관성을 동시에 달성하기 어려웠던 기존 indoor scene 생성의 한계를 극복하기 위해, layout·shape·texture를 단일 rectified flow로 공동 생성하는 tri-branch 프레임워크 FlowScene을 제안합니다.

#Review #Scene Generation #Rectified Flow #Multimodal Graph #3D Indoor Synthesis #Style Consistency #Generative Models

2026년 3월 22일

[논문리뷰] EgoForge: Goal-Directed Egocentric World Simulator

Generative world models는 dynamic environment를 simulate하고 reason하는 데 중요한 발전을 보였지만, egocentric vision에서는 rapid viewpoint changes, frequent hand-object interactions, 그리고 latent human intent에 의존하는 complex goal-directed behavior로 인해 어려움을 겪습니다.

2026년 3월 22일

[논문리뷰] Deep Tabular Research via Continual Experience-Driven Execution

Large language models (LLMs)는 구조화된 데이터에 대한 reasoning에서 상당한 능력을 보여주었지만, hierarchical 및 bidirectional header , merged cell , non-canonical layout 을 포함하는 unstructured table에 대한 complex long-horizon analytical task 에서는 어려움을 겪습니다.

#Review #Deep Tabular Research #LLM Agents #Tabular Reasoning #Continual Learning #Experience-Driven Execution #Multi-hop Reasoning #Unstructured Tables

2026년 3월 22일

[논문리뷰] CurveStream: Boosting Streaming Video Understanding in MLLMs via Curvature-Aware Hierarchical Visual Memory Management

Multimodal Large Language Models (MLLMs)는 오프라인 비디오 이해에서 뛰어난 성능을 보였으나, 스트리밍 비디오 시나리오에서는 본질적인 병목 현상에 직면한다.

#Review #Streaming Video Understanding #MLLMs #Memory Management #Curvature Score #Hierarchical Visual Memory #Catastrophic Forgetting

2026년 3월 22일

[논문리뷰] Cooperation and Exploitation in LLM Policy Synthesis for Sequential Social Dilemmas

기존의 다중 에이전트 강화 학습(MARL)은 Sequential Social Dilemmas (SSDs) 환경에서 credit assignment 의 어려움, non-stationarity , 그리고 방대한 joint action space 문제로 인해 효과적인 정책 학습에 한계를 보입니다.

#Review #LLM Policy Synthesis #Sequential Social Dilemmas (SSDs)#Feedback Engineering #Multi-agent Environments #Cooperation #Reward Hacking #Programmatic Policies

2026년 3월 22일

[논문리뷰] Beyond Single Tokens: Distilling Discrete Diffusion Models via Discrete MMD

Discrete diffusion models는 고품질 데이터를 생성할 수 있지만, 일반적으로 샘플링에 많은 반복(sampling steps) 이 필요하며 이는 높은 계산 비용 과 FLOPs 로 이어진다는 문제점이 있습니다.

#Review #Discrete Diffusion Models #Distillation #Moment Matching Distillation #D-MMD #GPT-2 Gradient Moment #Few-step Generators #CIFAR-10 #Open Web Text

2026년 3월 22일

[논문리뷰] BEAVER: A Training-Free Hierarchical Prompt Compression Method via Structure-Aware Page Selection

최근 LLMs의 context window가 기하급수적으로 확장되면서 long-document understanding의 잠재력이 커졌지만, 이는 심각한 inference latency와 정보 utilization 병목 현상을 야기했습니다.

#Review #Prompt Compression #Long-Context LLMs #Training-Free #Hierarchical Selection #Structure-Aware #Inference Latency #Information Utilization

2026년 3월 22일

[논문리뷰] Astrolabe: Steering Forward-Process Reinforcement Learning for Distilled Autoregressive Video Models

Distilled autoregressive (AR) video models는 efficient streaming generation을 가능하게 하지만, 종종 human visual preferences와 misalign되어 artifacts나 unnatural motion dynamics를 보입니다.

#Review #Video Generation #Distilled Autoregressive Models #Reinforcement Learning (RL)#Human Preferences #Streaming Generation #Forward-Process RL #Reward Hacking #Temporal Consistency

2026년 3월 22일

[논문리뷰] AgentDS Technical Report: Benchmarking the Future of Human-AI Collaboration in Domain-Specific Data Science

본 논문은 도메인 특화 Data Science 태스크에서 AI 에이전트가 인간 전문가의 성능을 어느 수준까지 대체할 수 있는지, 그리고 어떤 영역에서 인간의 전문성이 여전히 우위를 지니는지 평가하기 위한 벤치마크 AgentDS를 제안합니다.

#Review #AI Agents #Human-AI Collaboration #Data Science Benchmark #Large Language Models #Domain-Specific Reasoning #Multi-Industry Evaluation

2026년 3월 22일

[논문리뷰] A Subgoal-driven Framework for Improving Long-Horizon LLM Agents

Large language model (LLM)-based agents는 디지털 환경에서 강력한 자율 제어기로 부상했지만, 특히 웹 내비게이션과 같이 동적인 콘텐츠와 긴 액션 시퀀스를 요구하는 복잡한 task에서 long-horizon planning 능력의 약점을 드러낸다.

#Review #LLM Agents #Subgoals #Reinforcement Learning #Web Navigation #Long-Horizon Planning #Reward Shaping #Curriculum Learning

2026년 3월 22일