Review

[논문리뷰] Perception or Prejudice: Can MLLMs Go Beyond First Impressions of Personality?

본 논문은 MLLM이 인적 자원 관리나 정신 건강 진단 등 인간 중심적인 역할에 배치되면서 핵심적으로 요구되는 성격 인식(personality perception) 능력을 진단하고자 합니다.

#Review #Multimodal Large Language Models #Personality Perception #Grounded Personality Reasoning #MM-OCEAN #Prejudice Gap #Holistic-Grounding Rate #Apparent Personality Recognition

2026년 5월 21일

[논문리뷰] One Sentence, One Drama: Personalized Short-Form Drama Generation via Multi-Agent Systems

본 논문은 기존의 디지털 단편 드라마 제작 방식이 가진 narrative pacing의 부재, 클립 간 spatial consistency 부족, 그리고 높은 manual review 의존성이라는 세 가지 핵심 문제를 해결하고자 합니다.

#Review #Short-Form Drama #Multi-Agent System #3D-Grounded Generation #Narrative Pacing #Spatial Consistency #Production-Level Quality Control

2026년 5월 21일

[논문리뷰] OmniPro: A Comprehensive Benchmark for Omni-Proactive Streaming Video Understanding

본 논문은 Omni-modal Large Language Models(MLLMs)의 발전에도 불구하고, 실제 환경에서의 Proactive 스트리밍 이해 능력을 정밀하게 평가할 수 있는 표준화된 벤치마크가 부재하다는 문제점을 해결하고자 합니다 .

#Review #Omni-proactive streaming #Video understanding #Benchmark #Multimodal LLMs #Audio-visual perception #Long-horizon evaluation

2026년 5월 21일

[논문리뷰] More Context, Larger Models, or Moral Knowledge? A Systematic Study of Schwartz Value Detection in Political Texts

본 논문은 정치적 텍스트에서 Schwartz values를 감지할 때, 주변 문맥(Context)과 명시적인 도덕 지식이 모델 성능에 미치는 영향을 체계적으로 분석하고자 한다 . 정치적 발화는 가치가 간접적으로 표현되는 경우가 많아 문장 단위의 분류가 매우 어렵다.

#Review #Schwartz Values #Political Text #Retrieval-Augmented Generation (RAG)#DeBERTa #Large Language Models (LLMs)#Context Analysis

2026년 5월 21일

[논문리뷰] Minimalist Visual Inertial Odometry

본 연구는 자원 제약적인 로봇 플랫폼에서 기존 VIO (Visual-Inertial Odometry) 시스템의 높은 전력 소모 및 계산 요구사항이 가지는 한계점을 해결하고자 합니다.

#Review #Visual-Inertial Odometry #Minimalist Vision #Planar Odometry #Gabor Masks #Photodiode #Temporal Convolutional Network #Motion Estimation

2026년 5월 21일

[논문리뷰] Maestro: Reinforcement Learning to Orchestrate Hierarchical Model-Skill Ensembles

본 논문은 현대 LLM 에이전트가 특정 도메인에 강점을 가진 다양한 전문가 모델과 모듈식 스킬을 효과적으로 활용하지 못하는 Coordination Bottleneck 문제를 해결하고자 합니다.

#Review #Reinforcement Learning #Multimodal Agent #Orchestration #Skill Library #Expert Models #Hierarchical Registry

2026년 5월 21일

[논문리뷰] Lean Refactor: Multi-Objective Controllable Proof Optimization via Agentic Strategy Search

본 논문은 LLM이 생성한 Lean 4 증명이 정답은 맞추지만, 지나치게 장황하고 특정 버전의 라이브러리에 취약하다는 점을 해결하고자 합니다 .

#Review #Lean 4 #Proof Optimization #Agentic Framework #Retrieval-Augmented Generation #Multi-Objective Optimization #Formal Verification

2026년 5월 21일

[논문리뷰] LatentOmni: Rethinking Omni-Modal Understanding via Unified Audio-Visual Latent Reasoning

본 논문은 기존의 Explicit Text CoT 기반 MLLM이 고차원 오디오-비주얼 정보를 텍스트라는 좁은 병목으로 압축함에 따라, 다중 모달 간의 세밀한 시간적 정렬과 의미적 연결을 놓치는 문제를 해결하고자 한다.

#Review #Multimodal Large Language Models #Audio-Visual Reasoning #Latent Reasoning #Cross-modal Alignment #Chain-of-Thought #Instruction Tuning

2026년 5월 21일

[논문리뷰] KVServe: Service-Aware KV Cache Compression for Communication-Efficient Disaggregated LLM Serving

본 논문은 Disaggregated LLM Serving 환경에서 KV cache 통신이 전체 end-to-end 지연시간의 최대 60%를 차지하는 주요 병목 현상을 해결하고자 한다 .

#Review #LLM Serving #KV Cache Compression #Disaggregated Inference #Bayesian Optimization #Service-Aware Control

2026년 5월 21일

[논문리뷰] GenEvolve: Self-Evolving Image Generation Agents via Tool-Orchestrated Visual Experience Distillation

본 논문은 오픈 엔드 이미지 생성이 단순한 텍스트 프롬프트 기반의 task를 넘어, 모델의 내부 지식과 외부 리소스를 효과적으로 결합해야 하는 복잡한 에이전트 과정임을 강조합니다.

#Review #Image Generation #Agentic Workflow #Self-Evolving #Visual Experience Distillation #Tool-Orchestrated #On-Policy Distillation #Multimodal Agent

2026년 5월 21일

[논문리뷰] Gated DeltaNet-2: Decoupling Erase and Write in Linear Attention

본 논문은 Linear Attention 기반 모델들에서 메모리 편집의 핵심인 erase(제거)와 write(삽입) 동작이 단일 scalar gate에 의해 묶여 있는 구조적 한계를 해결하고자 합니다.

#Review #Linear Attention #Recurrent Neural Networks #Delta Rule #Fast-Weight Memory #Selective State Space #Chunkwise Parallel Training #Long-Context Retrieval

2026년 5월 21일

[논문리뷰] Full Attention Strikes Back: Transferring Full Attention into Sparse within Hundred Training Steps

본 논문은 Long-context 추론 시 발생하는 full attention의 이차 비용(quadratic cost) 문제를 해결하기 위해 효율적인 스파스(sparse) 구조로의 전환을 제안한다.

#Review #Long-context LLM #Sparse Attention #Head Specialization #Dynamic Top-pp Selection #Efficient Inference #Self-distillation

2026년 5월 21일

[논문리뷰] From Reasoning Chains to Verifiable Subproblems: Curriculum Reinforcement Learning Enables Credit Assignment for LLM Reasoning

본 논문은 난도가 높은 추론 문제에 대해 기존의 RLVR 방식이 가지는 효율성 한계를 해결하고자 한다 . 고난도 문제에서는 최종 정답에 도달하는 경로가 매우 희소하여, 모델이 중간 단계에서 올바른 추론을 수행하더라도 이를 학습 신호로 적절히 환원하기 어렵다.

#Review #Curriculum Reinforcement Learning #LLM Reasoning #Credit Assignment #Verifiable Rewards #Subproblem Decomposition #RLVR

2026년 5월 21일

[논문리뷰] FlowLong: Inference-time Long Video Generation via Manifold-constrained Tweedie Matching

본 논문은 비디오 Diffusion 모델의 생성 범위를 학습된 문맥 길이 이상으로 확장하는 과정에서 발생하는 품질 저하와 시간적 일관성 문제를 해결하고자 합니다.

#Review #Long Video Generation #Flow Matching #Tweedie Matching #Stochastic Early-Phase Sampling #Inference-time Framework #Diffusion Models

2026년 5월 21일

[논문리뷰] Diversed Model Discovery via Structured Table Discovery

본 논문은 기존의 모델 검색 시스템이 텍스트 중심의 시맨틱 유사도에 과도하게 의존하여 결과의 다양성을 저해하고 비교 가능한 정보를 충분히 제공하지 못하는 문제를 해결하고자 한다.

#Review #Model Lake #Model Search #Structured Semantic Search #Table Discovery #Nugget-based Evaluation #Model Cards

2026년 5월 21일

[논문리뷰] DelTA: Discriminative Token Credit Assignment for Reinforcement Learning from Verifiable Rewards

본 논문은 시퀀스 단위의 보상을 토큰 단위의 학습 신호로 변환할 때 발생하는 불투명성을 해결하기 위해 DelTA를 제안합니다. 기존의 RLVR 방식은 응답 전체에 대해 단일 스칼라 보상을 부여하지만, 실제 정책 업데이트는 토큰별로 이루어지므로 Granularity(세분성)의 불일치가 존재합니다.

#Review #RLVR #Credit Assignment #Discriminator #Policy-Gradient #Token-Level #Centroid

2026년 5월 21일

[논문리뷰] DecQ: Detail-Condensing Queries for Enhanced Reconstruction and Generation in Representation Autoencoders

본 논문은 RAE의 frozen VFM 인코더가 갖는 낮은 공간적 재구성 능력이 고품질 이미지 생성 및 세밀한 편집을 제한하는 문제를 해결하고자 합니다. 기존의 RAE 모델은 고수준의 의미론적 정보를 잘 유지하지만, VFM 학습 목적 상 색상이나 텍스처와 같은 저수준 세부 정보가 누락되는 경향이 있습니다 .

#Review #Representation Autoencoders #Vision Foundation Models #Detail-Condensing Queries #Latent Diffusion Models #Image Tokenizer #Reconstruction-Generation Trade-off

2026년 5월 21일

[논문리뷰] ClinSeekAgent: Automating Multimodal Evidence Seeking for Agentic Clinical Reasoning

본 연구는 기존 의료용 LLM 및 agentic 시스템이 이미 정제된 evidence에만 의존하는 수동적 패러다임에 갇혀 있다는 문제의식에서 출발한다.

#Review #ClinSeekAgent #Agentic Clinical Reasoning #Multimodal Evidence Seeking #EHR Retrieval #Clinical Decision Support #LLM Agent #Trajectory Distillation

2026년 5월 21일

[논문리뷰] Bernini: Latent Semantic Planning for Video Diffusion

본 논문은 현대의 MLLM과 영상 확산 모델(Diffusion Model)이 각각 고도의 추론 능력과 사실적 합성 능력을 갖추고 있음에도 불구하고, 이들을 효과적으로 통합하는 프레임워크가 부족하다는 점에 주목합니다.

#Review #Video Diffusion #Multimodal Large Language Models #Latent Semantic Planning #Diffusion Transformer #Video Editing #Chain-of-Thought

2026년 5월 21일

[논문리뷰] ACC: Compiling Agent Trajectories for Long-Context Training

본 연구는 기존 에이전트 학습(SFT)이 도구 응답을 차단함으로써 장거리 컨텍스트 상의 핵심 증거를 활용하지 못하는 Supervision Blind Spot 문제를 해결하고자 합니다.

#Review #Agent Trajectories #Long-Context Training #Supervision Blind Spot #Agent Context Compilation #Dependency Modeling #Expert Specialization

2026년 5월 21일