최신 포스트

[논문리뷰] Personal AI Agent for Camera Roll VQA

본 연구는 사용자 개인의 Camera Roll 전체를 대상으로 대화형 AI가 사진을 검색하고 질의에 응답하는 VQA 설정에서의 한계를 해결하고자 한다.

#Review #Personal AI Agent #Camera Roll #Visual Question Answering #Long-horizon Memory #Hierarchical Memory #Multimodal LLM #Agentic Workflow

2026년 6월 4일

[논문리뷰] OPRD: On-Policy Representation Distillation

본 논문은 Large Language Models (LLMs)의 Post-training에 필수적인 On-Policy Distillation (OPD) 방식의 본질적인 두 가지 한계점을 지적하며, 이를 해결하기 위한 새로운 접근 방식인 OPRD (On-Policy Representation Distillation)를 제안합니다.

#Review #On-Policy Distillation #Representation Distillation #Large Language Models #Knowledge Distillation #Hidden States #Mathematical Reasoning #Variance Reduction

2026년 6월 4일

[논문리뷰] Multimodal Music Recommendation System using LLMs

본 논문은 현대 음악 추천 시스템이 곡을 독립적인 ID 토큰으로만 취급하여 시맨틱(Semantic) 및 어쿠스틱(Acoustic) 콘텐츠 정보를 간과하는 문제를 해결합니다. 기존의 ID 기반 모델은 상호작용이 부족한 Cold-start 환경에서 성능이 저하되는 한계가 있습니다.

#Review #Music Recommendation #Multimodal Learning #Large Language Models #Sequential Recommendation #Audio Embeddings #Metadata Enrichment

2026년 6월 4일

[논문리뷰] Meta-Cognitive Memory Policy Optimization for Long-Horizon LLM Agents

본 논문은 메모리 기반 LLM agent가 장기적인(long-horizon) 과업 수행 시 발생하는 성능 저하 문제를 해결하기 위해 연구되었습니다.

#Review #LLM Agents #Long-Horizon Reasoning #Belief Entropy #Memory Optimization #Reinforcement Learning #Metacognition

2026년 6월 4일

[논문리뷰] MechVQA: Benchmarking and Enhancing Multimodal LLMs on Comprehensive Mechanical Drawing Understanding

본 논문은 범용 Multimodal Large Language Models (MLLMs)가 기계 공학 도면의 복잡성과 도메인 특수성을 제대로 해석하지 못하는 문제를 해결하고자 한다.

#Review #Multimodal Large Language Models #Mechanical Drawing Understanding #Visual Question Answering #Spatial Reasoning #Reinforcement Learning #Domain-Specialized Benchmark

2026년 6월 4일

[논문리뷰] MLEvolve: A Self-Evolving Framework for Automated Machine Learning Algorithm Discovery

본 논문은 기존의 LLM 기반 Machine Learning Engineering(MLE) 에이전트들이 겪고 있는 정보 고립, 메모리 부족, 비효율적인 long-horizon 최적화 문제를 해결하기 위해 MLEvolve를 제안합니다.

#Review #Automated Machine Learning #LLM Agents #Monte Carlo Graph Search #Self-Evolving #Long-Horizon Optimization #Algorithm Discovery

2026년 6월 4일

[논문리뷰] LoomVideo: Unifying Multimodal Inputs into Video Generation and Editing

본 연구는 기존 Unified Video Generation 모델들이 대규모 파라미터(13B 이상)에 의존하고, 비디오 편집 시 소스 토큰 연결로 인해 연산 비용이 기하급수적으로 증가하는 문제를 해결하고자 합니다.

#Review #Video Generation #Video Editing #Multimodal Large Language Model (MLLM)#Diffusion Transformer (DiT)#Deepstack Injection #Scale-and-Add

2026년 6월 4일

[논문리뷰] Latent Reasoning with Normalizing Flows

본 연구는 기존 텍스트 기반 Chain-of-Thought(CoT)의 높은 추론 비용과 정보 밀도 부족 문제를 해결하고자 합니다.

#Review #Chain-of-Thought #Normalizing Flows #Latent Reasoning #Large Language Models #Likelihood-based Modeling #Code Generation

2026년 6월 4일

[논문리뷰] LLMs Can Leak Training Data But Do They Want To? A Propensity-Aware Evaluation of Memorization in LLMs

본 논문은 기존의 대규모 언어 모델(LLM) Memorization 평가가 지나치게 'Capability(능력)' 측정에만 치중되어 있다는 한계를 지적합니다. 기존 연구들은 대개 Prefix Attack과 같은 adversarial 환경에서 모델이 얼마나 학습 데이터를 출력할 수 있는지만을 측정했습니다 .

#Review #Large Language Models #Memorization #Propensity-Aware Evaluation #Data Leakage #SimpleTrace #PropMe #Adversarial Attack

2026년 6월 4일

[논문리뷰] Is This Edit Correct? A Multi-Dimensional Benchmark for Reasoning-Aware Image Editing

본 논문은 현재의 확산 모델(Diffusion-based models) 기반 이미지 편집 시스템이 표면적인 지시사항 수행(Surface-level instruction following)에만 치중하여 논리적 일관성이 결여된 결과물을 생성하는 문제를 해결하고자 합니다 .

#Review #Image Editing #Reasoning-aware #Benchmark #Diffusion Models #Multi-modal LLMs #Logic Consistency #EditRefine

2026년 6월 4일

[논문리뷰] Imagine Before You Predict: Interleaved Latent Visual Reasoning for Video Event Prediction

본 논문은 기존의 Video MLLM들이 미래 사건 예측(VEP) 시 텍스트 기반의 Chain-of-Thought(CoT)에 의존함에 따라 발생하는 시각적 정보 손실 문제를 해결하고자 합니다.

#Review #Video Event Prediction #Multimodal Large Language Models #Latent Visual Reasoning #Interleaved Reasoning #Reinforcement Learning #Future-L1 #LA-DAPO

2026년 6월 4일

[논문리뷰] ForeSci: Evaluating LLM Agents for Forward-Looking AI Research Judgment

본 논문은 자율 연구 에이전트가 기술의 미래 발전 방향을 예측하는 의사결정 영역에서 얼마나 타당한 판단을 내릴 수 있는지에 대한 근본적인 의문을 제기합니다.

#Review #LLM Agents #Foresight Evaluation #Scientific Judgment #Temporal Integrity #Benchmark #Research Forecasting

2026년 6월 4일

[논문리뷰] Flash-WAM: Modality-Aware Distillation for World Action Models

본 논문은 WAM이 manipulation 벤치마크에서 강력한 성능을 보임에도 불구하고, 실시간 제어를 저해하는 높은 inference latency 문제를 해결하고자 합니다. 기존 WAM은 video 및 action denoising에 수십 단계의 반복적인 과정을 거쳐야 하므로 실시간 로봇 제어에 부적합합니다.

#Review #World-Action Models #Step Distillation #Consistency Models #Robotic Foundation Models #Flow Matching #Modality-Aware Distillation

2026년 6월 4일

[논문리뷰] EvoDS: Self-Evolving Autonomous Data Science Agent with Skill Learning and Context Management

기존의 데이터 과학 에이전트는 고정된 작업 워크플로우와 제한적인 Action space에 의존하여, 경험을 체계적으로 축적하거나 재사용하는 능력이 부족합니다.

#Review #Data Science Agent #Multi-Agent System #Self-Evolving #Agent Skill #Agentic Reinforcement Learning

2026년 6월 4일

[논문리뷰] Dream.exe: Can Video Generation Models Dream Executable Robot Manipulation?

본 논문은 비디오 생성 모델이 단순히 시각적으로 그럴듯한 영상을 만드는 수준을 넘어, 실제 물리 법칙을 내재화한 'World Model'로서의 기능을 수행하는지 검증하고자 합니다.

#Review #Video Generation Models #Robotic Manipulation #Physical Executability #Benchmark #Sim-to-Real #World Models

2026년 6월 4일

[논문리뷰] Discrete-WAM: Unified Discrete Vision-Action Token Editing for World-Policy Learning

본 논문은 기존 자율주행 시스템이 행동 조건부 동역학(Action-conditioned dynamics)을 명시적으로 모델링하지 못하고, 단순한 Direct State-to-Action Mapping에 의존한다는 근본적인 한계를 해결하고자 한다 .

#Review #Autonomous Driving #World Model #Discrete Diffusion #Token Editing #Policy Learning #Counterfactual Reasoning

2026년 6월 4일

[논문리뷰] Complexity-Balanced Diffusion Splitting

본 논문은 표준 확산 모델이 사용하는 단일 모놀리식(monolithic) 구조의 비효율성을 해결하고자 합니다. 기존 방식은 단순한 노이즈부터 복잡한 데이터 구조까지 모든 영역을 하나의 고정된 네트워크가 처리하게 하여, 특정 생성 단계에서 필요한 적정 모델 용량을 적재적소에 할당하지 못하는 한계가 있습니다.

#Review #Diffusion Models #Complexity-Balanced Splitting #Temporal Capacity Allocation #De Boor Principle #Dirichlet Energy #Path Acceleration #Generative Flow

2026년 6월 4일

[논문리뷰] Combinatorial Synthesis: Scaling Code RLVR via Atomic Decomposition and Recombination

본 논문은 RLVR의 확장을 가로막는 핵심 병목인 '도전적인 검증 가능(verifiable) 코드 데이터의 희소성' 문제를 해결하고자 합니다.

#Review #RLVR #Synthetic Data #Atomic Decomposition #Code Generation #Scaling #Reinforcement Learning

2026년 6월 4일

[논문리뷰] Benchmark Everything Everywhere All at Once

본 논문은 기존의 수동적인 벤치마크 구축 방식이 가진 한계인 노동 집약성, 재사용 불가능성, 그리고 모델 성능 향상에 따른 빠른 벤치마크 포화(Saturation) 문제를 해결하고자 합니다.

#Review #Benchmark Agent #Autonomous Evaluation #Benchmark Construction #MLLM-as-a-Judge #Agentic Workflow #Performance Saturation

2026년 6월 4일

[논문리뷰] ArcANE: Do Role-Playing Language Agents Stay in Character at the Right Time?

본 연구는 기존 RPLA 벤치마크가 캐릭터를 서사 흐름과 무관한 정적인 persona로 간주하여 발생하는 행동 일관성 부족 문제를 해결하고자 합니다.

#Review #Role-Playing Language Agents #Character Arc #Narrative Evaluation #Temporal Alignment #Language Model Benchmarking #Persona Grounding

2026년 6월 4일