Review

[논문리뷰] HyperVL: An Efficient and Dynamic Multimodal Large Language Model for Edge Devices

현재 멀티모달 대규모 언어 모델(MLLM)이 가진 높은 연산 및 메모리 요구사항으로 인한 온디바이스 배포의 어려움을 해결하는 것을 목표로 합니다.

#Review #Multimodal Large Language Model #Edge AI #Efficient Inference #Visual Resolution Compressor #Dual Consistency Learning #Vision Transformer #Quantization #Low-Latency

2025년 12월 17일

[논문리뷰] Fast and Accurate Causal Parallel Decoding using Jacobi Forcing

본 논문은 대규모 언어 모델(LLM)의 순차적(autoregressive, AR) 디코딩으로 인한 높은 지연 시간을 해결하고, AR 모델의 생성 품질과 인과적 추론 특성을 유지하면서 효율적인 병렬 디코딩을 가능하게 하는 것을 목표로 합니다.

#Review #Parallel Decoding #Causal LLM #Jacobi Decoding #Consistency Distillation #Transformer Inference #Latency Reduction #Rejection Recycling #Multi-block Decoding

2025년 12월 17일

[논문리뷰] DiffusionVL: Translating Any Autoregressive Models into Diffusion Vision Language Models

본 논문은 기존 확산 비전 언어 모델(dVLMs)의 성능 저하와 가변 길이 생성 및 KV 캐시 재사용의 비효율성 문제를 해결하고자 합니다.

#Review #Diffusion Models #Vision Language Models #Autoregressive Models #Diffusion Finetuning #Block Diffusion #Multimodal AI #KV Cache

2025년 12월 17일

[논문리뷰] DEER: Draft with Diffusion, Verify with Autoregressive Models

본 논문은 autoregressive (AR) 디코딩의 내재된 지연으로 인해 발생하는 LLM 기반 에이전트 및 추론 시스템의 효율성 문제를 해결하고자 합니다. 특히, 기존 AR 기반 드래프터의 단계별 불확실성 누적과 순차적 디코딩으로 인한 제한적인 가속화 문제를 극복하는 것을 목표로 합니다.

#Review #Speculative Decoding #Diffusion LLM #Autoregressive Model #Inference Acceleration #Model Alignment #Code Generation #Block Regeneration

2025년 12월 17일

[논문리뷰] Can LLMs Guide Their Own Exploration? Gradient-Guided Reinforcement Learning for LLM Reasoning

본 논문은 LLM의 강화 학습(RL) 탐색 메커니즘이 모델의 실제 학습 방식과 근본적으로 일치하지 않는다는 문제를 제기합니다.

#Review #Reinforcement Learning #Large Language Models #Exploration Strategy #Gradient-Guided #Reward Shaping #Reasoning #PPO

2025년 12월 17일

[논문리뷰] Video Reality Test: Can AI-Generated ASMR Videos fool VLMs and Humans?

본 논문은 최근 AI 생성 비디오의 높은 현실성으로 인해 야기되는 진위 판별 문제를 해결하고자 합니다.

#Review #AIGC Detection #ASMR Videos #VLM Evaluation #VGM Realism #Audio-Visual Consistency #Perceptual Fidelity #Adversarial Benchmark #Deepfake Detection

2025년 12월 16일

[논문리뷰] Sparse-LaViDa: Sparse Multimodal Discrete Diffusion Language Models

본 논문은 Masked Diffusion Models (MDMs)의 주요 비효율성, 즉 KV 캐싱 미지원 과 불필요한 마스크 토큰 처리 로 인한 느린 추론 속도 문제를 해결하고자 합니다. 특히, 멀티모달 태스크 전반에서 성능 저하 없이 효율성을 크게 향상시키는 새로운 모델링 프레임워크 를 제안하는 것이 목표입니다.

#Review #Discrete Diffusion Models #Multimodal Models #Sparse Parameterization #KV Caching #Token Truncation #Image Generation #Image Editing #Visual Reasoning

2025년 12월 16일

[논문리뷰] ShowTable: Unlocking Creative Table Visualization with Collaborative Reflection and Refinement

논문은 기존 이미지 생성 및 통합 모델이 깊은 추론, 계획, 그리고 데이터-시각 매핑의 정밀성을 요구하는 복잡한 태스크에서 한계를 보이는 문제에 주목합니다.

#Review #Table Visualization #Infographic Generation #Multi-modal Large Language Models (MLLMs)#Diffusion Models #Self-Correction #Reinforcement Learning #Graphic Design #Data-to-Visual Mapping

2025년 12월 16일

[논문리뷰] RecGPT-V2 Technical Report

RecGPT-V2는 기존 RecGPT-V1의 LLM 기반 추천 시스템 이 겪던 계산 비효율성, 설명 다양성 부족, 제한된 일반화 능력, 단순한 평가 방식의 네 가지 근본적인 한계를 해결하는 것을 목표로 합니다.

#Review #Recommender Systems #Large Language Models #Multi-Agent Systems #Reinforcement Learning #Dynamic Prompting #Hybrid Representation #Agentic Evaluation #Explanation Generation

2025년 12월 16일

[논문리뷰] Olmo 3

Olmo 3는 7B 및 32B 파라미터 스케일에서 최첨단, 완전 오픈(fully-open) 언어 및 사고 모델 제품군을 소개하는 것을 목표로 합니다. 이 연구의 핵심은 모델의 전체 라이프사이클(모든 단계, 체크포인트, 데이터 포인트, 종속성 포함)을 완전히 공개 하여 무한한 커스터마이징과 연구 기회를 제공하는 것입니다.

#Review #Large Language Models #Open-Source AI #Model Flow #Long-Context Reasoning #Instruction Following #Function Calling #Thinking Models #Data Curation #Reinforcement Learning

2025년 12월 16일

[논문리뷰] MMGR: Multi-Modal Generative Reasoning

본 논문은 대규모 텍스트-투-비디오 모델 평가의 한계, 특히 인지적 충실도를 넘어선 추론 능력 을 평가하는 문제를 해결하고자 합니다.

#Review #Multi-Modal Generative Models #Reasoning Evaluation #World Models #Physical Commonsense #Abstract Reasoning #Embodied Navigation #VLM-based Evaluation #Temporal Consistency

2025년 12월 16일

[논문리뷰] Janus: Disaggregating Attention and Experts for Scalable MoE Inference

본 연구는 대규모 Mixture-of-Experts (MoE) 모델 추론 시 발생하는 높은 자원 요구량, 동적 워크로드, 그리고 어텐션 및 전문가 레이어 간의 이질적인 컴퓨팅 요구사항 문제를 해결하고자 합니다.

#Review #MoE Inference #Disaggregated Architecture #Resource Management #Scalability #Load Balancing #GPU Utilization #Communication Optimization

2025년 12월 16일

[논문리뷰] A4-Agent: An Agentic Framework for Zero-Shot Affordance Reasoning

이 논문은 고수준 추론과 저수준 그라운딩이 긴밀하게 결합된 기존 end-to-end 어포던스 예측 모델들이 새로운 객체나 복잡한 지시에 대한 일반화에 어려움을 겪는 한계를 해결하고자 합니다.

#Review #Affordance Prediction #Zero-Shot Learning #Agentic AI #Foundation Models #Multimodal Reasoning #Visual Grounding #Image Generation #Robotics

2025년 12월 16일

[논문리뷰] V-REX: Benchmarking Exploratory Visual Reasoning via Chain-of-Questions

본 논문은 기존 VLM이 복잡하고 개방형인 시각 추론 태스크에서 다단계 탐색 및 동적 계획 수립에 어려움을 겪는 문제를 해결하고자 합니다. 대규모 탐색 공간으로 인해 평가하기 어려운 VLM의 탐색적 시각 추론 능력을 정량적으로 평가하기 위한 벤치마크 ( V-REX ) 및 평가 프로토콜을 개발하는 것을 목표로 합니다.

#Review #Visual Reasoning #Multi-step Exploration #Chain-of-Questions (CoQ)#Vision-Language Models (VLMs)#Benchmarking #Planning #Following

2025년 12월 15일

[논문리뷰] Towards Scalable Pre-training of Visual Tokenizers for Generation

본 논문은 시각 토크나이저(예: VAE)의 잠재 공간이 저수준 정보에 편향되어 고품질 생성으로 이어지지 않는 '사전 학습 스케일링 문제'를 해결하는 것을 목표로 합니다.

#Review #Visual Tokenizers #Pre-training #Latent Diffusion Models #Generative Models #Vision Transformer #Contrastive Learning #Self-Supervised Learning #Scaling Laws

2025년 12월 15일

[논문리뷰] Towards Interactive Intelligence for Digital Humans

본 논문은 기존의 모방적인 디지털 휴먼이 가지는 상호작용 논리 및 자율성 부족 문제를 해결하고, 개성-정렬 표현, 적응적 상호작용, 자가 진화 능력 을 갖춘 '상호작용 지능(Interactive Intelligence)' 을 구현하는 것을 목표로 합니다.

#Review #Digital Human #Interactive Intelligence #Multimodal Interaction #LLM Agent #Real-time Animation #Persona Fidelity #Diffusion Models

2025년 12월 15일

[논문리뷰] Toward Ambulatory Vision: Learning Visually-Grounded Active View Selection

본 논문은 정적인 이미지에 국한된 기존 Vision-Language Models (VLMs) 의 Visual Question Answering (VQA) 한계를 극복하고, 앰뷸러토리 비전 능력을 갖춘 에이전트가 더 유익한 시점을 능동적으로 선택하도록 학습시키는 것을 목표로 합니다.

#Review #Active Perception #Vision-Language Models (VLMs)#Embodied AI #View Selection #Reinforcement Learning (RL)#Supervised Fine-Tuning (SFT)#Visual Question Answering (VQA)#3D Environments

2025년 12월 15일

[논문리뷰] Openpi Comet: Competition Solution For 2025 BEHAVIOR Challenge

2025 BEHAVIOR Challenge에서 물리적 에이전트 가 시뮬레이션 환경에서 장기적인 작업을 성공적으로 수행하는 문제에 집중하며, 기존 Vision-Language-Action (VLA) 모델 의 한계를 극복하는 것을 목표로 합니다.

#Review #Embodied AI #Long-horizon Tasks #Vision-Language-Action Models (VLA)#BEHAVIOR Challenge #Offline RL #Pre-training #Rejection Sampling Fine-Tuning (RFT)#Robotics

2025년 12월 15일

[논문리뷰] NL2Repo-Bench: Towards Long-Horizon Repository Generation Evaluation of Coding Agents

이 논문은 기존 코딩 에이전트 벤치마크들이 완전한 소프트웨어 시스템을 구축하는 데 필요한 장기적인 추론 능력 을 엄격하게 평가하지 못하는 문제를 해결하고자 합니다.

#Review #Coding Agents #LLMs #Software Engineering #Repository Generation #Long-Horizon Reasoning #Benchmark #Python Development #Autonomous Systems

2025년 12월 15일

[논문리뷰] Memory in the Age of AI Agents

이 서베이 논문은 급증하는 AI 에이전트 메모리 연구 분야의 파편화된 개념적 명확성 부족을 해결하고, 기존 분류 체계의 한계 를 극복하고자 합니다.

#Review #AI Agents #Memory Systems #LLMs #Taxonomy #Continual Learning #Self-Evolution #Multimodality #Reinforcement Learning

2025년 12월 15일