Review

[논문리뷰] WorldGrow: Generating Infinite 3D World

논문은 무한히 확장 가능한(infinitely extendable) 3D 세계 를 일관된 기하학적 구조와 사실적인 외관으로 생성하는 핵심 과제를 해결하고자 합니다.

#Review #3D World Generation #Infinite Scene Synthesis #Block-wise Generation #Coarse-to-Fine #3D Inpainting #Structured Latent Representation #Virtual Environments #World Models

2025년 10월 27일

[논문리뷰] Visual Diffusion Models are Geometric Solvers

본 논문은 시각적 확산 모델(visual diffusion models)이 기하학적 문제를 해결하는 효과적인 솔루션으로 기능할 수 있음을 증명하는 것을 목표로 합니다.

#Review #Diffusion Models #Geometric Problem Solving #Inscribed Square Problem #Steiner Tree Problem #Maximum Area Polygonization #Image Generation #Pixel Space

2025년 10월 27일

[논문리뷰] Video-As-Prompt: Unified Semantic Control for Video Generation

이 논문은 비디오 생성 분야에서 통합적이고 일반화 가능한 의미론적 제어라는 중요한 과제를 해결하고자 합니다. 기존 방법론들이 부적절한 픽셀 단위 사전 정보를 강요하여 아티팩트를 생성하거나, 특정 조건에 대한 파인튜닝이나 태스크별 아키텍처에 의존하여 일반화가 어렵다는 문제를 극복하는 것을 목표로 합니다.

#Review #Video Generation #Semantic Control #Diffusion Transformers #In-Context Learning #Mixture-of-Transformers #Video-As-Prompt #Controllable Generation #Large-scale Dataset

2025년 10월 27일

[논문리뷰] UI-Ins: Enhancing GUI Grounding with Multi-Perspective Instruction-as-Reasoning

본 논문은 GUI 그라운딩(grounding) 태스크에서 자연어 명령어의 다양성과 품질 이 모델 성능에 미치는 영향을 간과했던 기존 연구의 한계를 극복하고자 합니다. 명령어에 존재하는 23.3%의 오류율 을 개선하고, 추론 시 명령어 다양성 을 활용하여 최대 76%의 상대적 성능 향상 을 목표로 합니다.

#Review #GUI Grounding #Natural Language Instructions #Multi-Perspective Reasoning #Supervised Fine-Tuning (SFT)#Reinforcement Learning (RL)#Policy Collapse Mitigation #GUI Agents

2025년 10월 27일

[논문리뷰] Taming Modality Entanglement in Continual Audio-Visual Segmentation

본 논문은 미세한 수준의 모달리티 얽힘(modality entanglement)을 해결하기 위한 새로운 과제인 Continual Audio-Visual Segmentation (CAVS) 을 제안합니다.

#Review #Continual Learning #Audio-Visual Segmentation #Modality Entanglement #Semantic Drift #Co-occurrence Confusion #Rehearsal Strategy #Sample Selection

2025년 10월 27일

[논문리뷰] Stabilizing MoE Reinforcement Learning by Aligning Training and Inference Routers

본 논문은 Mixture-of-Experts (MoE) 모델 의 강화 학습(RL) 훈련 과정에서 발생하는 불안정성, 특히 훈련-추론 간 라우팅 동작의 불일치 로 인한 정책 KL 발산 및 훈련 붕괴 문제 를 해결하는 것을 목표로 합니다.

#Review #MoE #Reinforcement Learning #Training Stability #Routing #Policy Alignment #Rollout Routing Replay #LLMs

2025년 10월 27일

[논문리뷰] Sparser Block-Sparse Attention via Token Permutation

본 논문은 LLM에서 긴 컨텍스트 길이 처리 시 O(N^2) 복잡도 를 가진 self-attention 메커니즘 으로 인한 막대한 계산 비용과 메모리 병목 현상을 해결하고자 합니다.

#Review #Large Language Models (LLMs)#Self-Attention #Block-Sparse Attention #Token Permutation #Computational Efficiency #Prefilling #Long Context #Causal Attention

2025년 10월 27일

[논문리뷰] Soft Instruction De-escalation Defense

본 논문은 외부 환경과 상호작용하는 LLM 기반 에이전트 시스템 이 겪는 프롬프트 인젝션 공격에 대한 취약성을 해결하는 것을 목표로 합니다. 특히, 신뢰할 수 없는 데이터 내의 악의적인 명령을 효과적으로 무력화하면서도 에이전트의 유용성을 저해하지 않는 방어 메커니즘을 제안합니다.

#Review #Prompt Injection #LLM Security #Agentic Systems #Iterative Sanitization #Instruction Control #Adversarial Robustness #Large Language Models

2025년 10월 27일

[논문리뷰] Sample By Step, Optimize By Chunk: Chunk-Level GRPO For Text-to-Image Generation

본 논문은 flow-matching 기반 T2I(Text-to-Image) 생성 에서 GRPO(Group Relative Policy Optimization)의 두 가지 주요 한계, 즉 불정확한 이점 귀인(inaccurate advantage attribution) 과 생성 과정의 시간적 역학(temporal dynamics) 무시 를 해결하는 것을 목표로 합니다.

#Review #Text-to-Image Generation #Reinforcement Learning #GRPO #Flow Matching #Chunk-level Optimization #Temporal Dynamics #Diffusion Models

2025년 10월 27일

[논문리뷰] Reasoning with Sampling: Your Base Model is Smarter Than You Think

본 논문은 LLM의 RL-사후 훈련(RL-posttraining)이 진정으로 새로운 추론 능력을 부여하는지, 아니면 기본 모델의 기존 능력을 '선명하게' 하는 것인지에 대한 질문에 답하고자 합니다.

#Review #LLMs #MCMC #Sampling #Reasoning #Distribution Sharpening #Reinforcement Learning (RL)#Inference-time Optimization #Training-free

2025년 10월 27일

[논문리뷰] RECALL: REpresentation-aligned Catastrophic-forgetting ALLeviation via Hierarchical Model Merging

대규모 언어 모델(LLMs)이 연속 학습 및 다중 도메인 환경에서 겪는 Catastrophic Forgetting (CF) 문제를 해결하는 것을 목표로 합니다.

#Review #Catastrophic Forgetting #Continual Learning #Model Merging #LLMs #Representation Learning #Data-free Learning #Hierarchical Parameter Fusion

2025년 10월 27일

[논문리뷰] RAPO++: Cross-Stage Prompt Optimization for Text-to-Video Generation via Data Alignment and Test-Time Scaling

본 논문은 사용자 제공 프롬프트가 짧고 구조화되지 않으며 훈련 데이터와 불일치하여 확산 기반 T2V 모델 의 생성 잠재력을 제한하는 문제를 해결합니다. 생성 백본 모델을 수정하지 않으면서 T2V 생성 품질 을 대폭 향상시키기 위한 프롬프트 최적화 프레임워크를 제안하는 것을 목표로 합니다.

#Review #Text-to-Video Generation #Prompt Optimization #Large Language Models (LLM)#Test-Time Scaling #Retrieval-Augmented Generation #Diffusion Models #Data Alignment

2025년 10월 27일

[논문리뷰] PhysWorld: From Real Videos to World Models of Deformable Objects via Physics-Aware Demonstration Synthesis

제한된 실제 비디오 데이터로부터 변형 가능한 물체의 물리 일관성 있는 동역학 모델을 학습하는 데 따르는 데이터 부족 문제를 해결하고, 정확하면서도 빠른 추론이 가능한 월드 모델을 구축하는 것을 목표로 합니다. 특히, 시공간적으로 변이하는 물리적 특성을 가진 물체에 대한 모델링을 중점적으로 다룹니다.

#Review #World Models #Deformable Objects #Physics Simulation #GNN #Digital Twin #Data Synthesis #Real-to-Sim #Physics-Aware Learning

2025년 10월 27일

[논문리뷰] PhysVLM-AVR: Active Visual Reasoning for Multimodal Large Language Models in Physical Environments

본 연구는 기존 MLLM이 정적이고 완전히 관찰 가능한 환경에 국한되어 실제 물리적 환경에서의 정보 불완전성 문제에 취약하다는 한계를 지적합니다.

#Review #Active Visual Reasoning #MLLM #Physical Environments #Partially Observable #Markov Decision Process #Chain-of-Thought #Embodied AI #CLEVR-AVR

2025년 10월 27일

[논문리뷰] Model Merging with Functional Dual Anchors

본 논문은 파운데이션 모델의 finetuned 체크포인트에서 지식을 통합하는 모델 병합(Model Merging) 과정에서 발생하는 파라미터 충돌 과 태스크별 지식 충돌 문제를 해결하는 것을 목표로 합니다.

#Review #Model Merging #Functional Dual Anchors #Input-Representation Space #Task Vectors #Knowledge Integration #Foundation Models #Gradient Matching #Post-training Strategy

2025년 10월 27일

[논문리뷰] Map the Flow: Revealing Hidden Pathways of Information in VideoLLMs

본 논문은 Video Large Language Models ( VideoLLMs )가 비디오-텍스트 정보(spatiotemporal inputs)를 어떻게 내부적으로 추출하고 전파하여 비디오 질의응답 (VideoQA) 태스크에서 Temporal Reasoning을 수행하는지 그 메커니즘을 밝히는 것을 목표로 합니다.

#Review #Video Large Language Models #VideoQA #Mechanistic Interpretability #Attention Knockout #Temporal Reasoning #Information Flow #Model Interpretability #Logit Lens

2025년 10월 27일

[논문리뷰] From Denoising to Refining: A Corrective Framework for Vision-Language Diffusion Model

이 논문은 비전-언어 확산 모델에서 발생하는 train-inference 불일치 로 인한 오류 연쇄(error cascade) 문제를 해결하는 것을 목표로 합니다. 특히 병렬 디코딩 시 초기 토큰 오류가 전체 생성 컨텍스트를 오염시켜 구문 오류 및 의미론적 환각 을 유발하는 문제를 극복하고자 합니다.

#Review #Discrete Diffusion Models #Vision-Language Models #Error Cascades #Self-Correction #Refinement Framework #Parallel Generation #Image Captioning #Hallucination Mitigation

2025년 10월 27일

[논문리뷰] Foley Control: Aligning a Frozen Latent Text-to-Audio Model to Video

본 논문은 사전 학습된 텍스트-오디오(T2A) 모델 을 동결시킨 상태에서, 비디오 가이드 Foley 음향 합성 을 위한 경량의 접근 방식을 제안합니다.

#Review #Text-to-Audio #Video-to-Audio #Foley Synthesis #Diffusion Models #Cross-Attention #Frozen Backbones #Video Embeddings #Rotary Position Embeddings

2025년 10월 27일

[논문리뷰] Document Understanding, Measurement, and Manipulation Using Category Theory

본 논문은 범주 이론(Category Theory) 을 활용하여 문서의 구조를 추출하고 정보 콘텐츠를 측정 하며, 요약 및 확장(exegesis) 과 같은 조작을 가능하게 하는 수학적 프레임워크를 개발하는 것을 목표로 합니다.

#Review #Category Theory #Document Understanding #Large Language Models #Information Theory #Rhetorical Structure Theory #Document Summarization #Rate Distortion Analysis #Self-supervised Learning

2025년 10월 27일

[논문리뷰] DeepAgent: A General Reasoning Agent with Scalable Toolsets

기존 LLM 기반 에이전트의 정형화된 워크플로우, 동적 도구 발견의 부재, 비효율적인 장기 상호작용 및 메모리 관리 한계를 극복하는 것을 목표로 합니다.

#Review #Autonomous Agents #Large Language Models #Tool Use #Reinforcement Learning #Memory Management #Tool Retrieval #Agentic Reasoning

2025년 10월 27일