최신 포스트

[논문리뷰] iMaC: Translating Actions into Motion and Contact Images for Embodied World Models

본 논문은 Embodied World Model이 로봇 정책(Policy) 평가 시 가지는 행동 조건부(Action-Conditioning) 비디오 생성의 불확실성 문제를 해결하고자 한다.

#Review #Embodied World Models #Action-Conditioned Video Generation #Robot Policy Evaluation #Motion Images #Contact Images #URDF/FK #Long-Horizon Manipulation

2026년 6월 14일

[논문리뷰] World Tracing: Generative Pixel-Aligned Geometry Beyond the Visible

본 논문은 기존의 단일 이미지 3D 추정 방식이 가진 '충실도(Faithfulness)'와 '완전성(Completeness)' 사이의 상충 문제를 해결하고자 합니다.

#Review #World Tracing #Pixel-Aligned #Geometry Generation #Diffusion Transformer #Flow Matching #Multilayer #3D Vision

2026년 6월 14일

[논문리뷰] When is Your LLM Steerable?

본 연구는 Activation Steering의 성공 여부가 모델, 프롬프트, 개념, 그리고 Steering Strength의 복합적인 요소에 의해 결정되는 취약성 문제를 해결하고자 합니다.

#Review #Activation Steering #Steerability Prediction #LLM Inference #Gradient Boosting Decision Trees #ASTEER Dataset #SteerBoost

2026년 6월 14일

[논문리뷰] WaveDiT: Distribution-Aware Wavelet Flow Matching for Efficient 3D Brain MRI Synthesis

본 논문은 3D MRI 합성 시 발생하는 높은 계산 비용과 해부학적 상세 정보 손실 문제를 해결하기 위해 WaveDiT를 제안합니다.

#Review #3D MRI Synthesis #Flow Matching #Discrete Wavelet Transform #Heteroscedastic Uncertainty #Generative Models #Brain Age Prediction

2026년 6월 14일

[논문리뷰] VISTA: View-Consistent Self-Verified Training for GUI Grounding

본 논문은 기존의 GRPO를 활용한 GUI Grounding 학습에서 발생하는 보상 퇴화(reward degeneracy) 문제를 해결하는 데 집중합니다.

#Review #GUI Grounding #GRPO #Self-Verified Training #View-Consistent #Reinforcement Learning #VLM

2026년 6월 14일

[논문리뷰] The Hidden Power of Scaling Factor in LoRA Optimization

본 논문은 LoRA 학습 시 하이퍼파라미터인 scaling factor $\alpha$의 역할이 체계적으로 연구되지 않았으며, 단순히 learning rate($\eta$)의 보조적 수단으로만 간주되어 온 점을 지적합니다.

#Review #LoRA #Scaling Factor #Optimization Dynamics #Signal-Drift Framework #Spectral Suppression #PEFT

2026년 6월 14일

[논문리뷰] The Arbiter Agent: Continually Monitoring Multi-Agent Conversations to Detect Emergent Misalignment

본 논문은 독립적으로는 정렬된(Aligned) 에이전트들이 상호작용하며 발생하는 예측 불가능한 시스템 레벨의 위험을 감지하기 위한 실시간 감사 프레임워크를 제안한다.

#Review #Multi-agent Safety #Emergent Misalignment #Alignment Auditing #LLM Agents #AI Control #Budget-constrained Monitoring

2026년 6월 14일

[논문리뷰] Squeeze-Release: Iterative Pruning with Exact Structural Minimization

본 논문은 일반적인 비구조적(Unstructured) Pruning이 파라미터의 중요도에 따라 0으로 만들더라도, 실제 tensor의 물리적 크기를 줄이지 못해 모델 압축 효과가 미비한 문제를 해결하고자 한다. .

#Review #Network Pruning #Model Compression #Iterative Pruning #Function-preserving Transformations #Layer Normalization

2026년 6월 14일

[논문리뷰] Smaller Models are Natural Explorers for Policy-Level Diversity in GRPO

본 논문은 GRPO (Group Relative Policy Optimization) 기반 LLM 학습에서 rollout diversity를 향상시키기 위한 새로운 차원을 식별한다.

#Review #GRPO #LLMs #Policy-Level Diversity #Token-Level Diversity #S2L-PO #Reinforcement Learning #Mathematical Reasoning #Parameter-Level Compression

2026년 6월 14일

[논문리뷰] Skip a Layer or Loop It? Learning Program-of-Layers in LLMs

본 논문은 모든 입력에 대해 고정된 depth와 순서로 수행되는 기존 LLM의 정적 추론 방식이 비효율적이며, 모델의 잠재적 추론 능력을 충분히 활용하지 못한다는 점을 지적합니다 .

#Review #Large Language Models #Dynamic Inference #Program-of-Layers #Test-time Scaling #Layer Skipping #Layer Recurrence #Computational Efficiency

2026년 6월 14일

[논문리뷰] RhymeFlow: Training-Free Acceleration for Video Generation with Asynchronous Denoising Flow Scheduling

본 논문은 기존 비디오 생성 모델에서 관찰되는 3D spatiotemporal attention의 이차 복잡도로 인한 과도한 Inference Latency 및 계산 비용 문제를 해결하는 것을 목적으로 한다.

#Review #Video Diffusion Models #Diffusion Transformers #Training-Free Acceleration #Asynchronous Scheduling #Latent Trajectory Projection #Spatiotemporal Coherence

2026년 6월 14일

[논문리뷰] Rethinking RAG in Long Videos: What to Retrieve and How to Use It?

본 논문은 VideoRAG 시스템이 직면한 평가의 불투명성과 최적의 검색 전략 부재 문제를 해결하고자 합니다.

#Review #VideoRAG #Egocentric Video #V-RAGBench #CARVE #Chunk-Adaptive Reranking #Multimodal Retrieval #Long-form Video Understanding

2026년 6월 14일

[논문리뷰] RepFusion: Leveraging Multimodal Priors for Denoising in Representation Space

본 논문은 현대 Text-to-Image (T2I) 시스템에서 LLM이 단순한 텍스트 인코딩에만 그치고 있다는 점을 문제로 제기합니다 . 기존 시스템들은 정적인 텍스트 임베딩만을 생성하고, 정작 중요한 denoising 과정은 새로 초기화된 DiT가 단독으로 수행하는 비효율적인 분업 구조를 취하고 있습니다.

#Review #RepFusion #Multimodal LLMs (MLLM)#Diffusion Transformers (DiT)#Representation Autoencoders (RAE)#Denoising #Conditional Encoder #Test-time Compute

2026년 6월 14일

[논문리뷰] RedAct: Redacting Agent Capability Traces for Procedural Skill Protection

본 논문은 에이전트의 실행 기록(execution trace)이 투명성과 디버깅을 위해 공개될 때, 그 안에 포함된 proprietary procedural skill이 무단으로 유출되는 보안 문제를 해결하고자 합니다.

#Review #Agent Security #Trace Redaction #Procedural Skill Protection #Behavioral Watermarking #Black-box Trace Disclosure

2026년 6월 14일

[논문리뷰] Quickest Detection of Hallucination Onset: Delay Bounds and Learned CUSUM Statistics

본 논문은 LLM의 환각(Hallucination) 탐지를 단순한 분류(Classification) 문제가 아닌 스트리밍 환경에서의 Sequential Change-Point Detection 문제로 재정의합니다.

#Review #Sequential Change-Point Detection #Hallucination Detection #CUSUM #Lorden Bound #Information Rate #Autoregressive Model

2026년 6월 14일

[논문리뷰] P3D-Bench: Benchmarking MLLMs for Parametric 3D Generation and Structural Reasoning

본 논문은 기존의 3D 생성 벤치마크가 프로그램 기반의 파라메트릭 생성 능력을 종합적으로 평가하지 못한다는 한계를 해결하기 위해 P3D-Bench를 제안합니다 .

#Review #Parametric 3D Generation #MLLM #Benchmark #CAD #Structural Reasoning #Code Generation

2026년 6월 14일

[논문리뷰] Orchestra-o1: Omnimodal Agent Orchestration

본 논문은 기존의 LLM 기반 에이전트가 단일 양식 혹은 제한적인 멀티모달 환경에 최적화되어 있어, 실생활의 복잡한 옴니모달(Omnimodal) 작업을 처리하는 데 한계가 있다는 문제의식에서 출발합니다.

#Review #Omnimodal Agent #Agent Orchestration #Task Decomposition #Multi-Agent System #Reinforcement Learning #DA-GRPO

2026년 6월 14일

[논문리뷰] OmniVideo-100K: A Dataset for Audio-Visual Reasoning through Structured Scripts and Evidence Chains

본 논문은 기존 자동화된 오디오-비주얼 QA 파이프라인의 “비디오-캡션-QA” 패러다임이 가진 근본적인 한계를 해결하고자 합니다 . 기존 연구들은 비디오를 독립적인 짧은 클립으로 분할하여 처리함으로써 시청각 모달리티 간의 결합을 파괴하고 개체 기술의 불일치를 초래합니다.

#Review #Audio-Visual Reasoning #Instruction-tuning #Entity-Anchored Scripting #Clue-Guided QA Generation #Multimodal Large Language Models (MLLMs)#Evidence Chains

2026년 6월 14일

[논문리뷰] OmniDirector: General Multi-Shot Camera Cloning without Cross-Paired Data

본 논문은 기존 비디오 생성 모델의 카메라 제어 방식이 지닌 정밀도 한계와 데이터 희소성 문제를 해결하기 위해 OmniDirector를 제안합니다.

#Review #Video Generation #Camera Control #Multi-shot Cloning #Diffusion Transformers #Camera Grid #Multimodal Control #Prompt Expansion

2026년 6월 14일

[논문리뷰] Memory is Reconstructed, Not Retrieved: Graph Memory for LLM Agents

본 논문은 LLM 에이전트가 복잡한 장기 상호작용 기록에서 장기 기억을 효과적으로 활용하지 못하는 근본적인 원인을 지적합니다.

#Review #LLM Agents #Memory Reconstruction #Graph Memory #Associative Memory #Active Retrieval #Long-horizon Reasoning

2026년 6월 14일