#Efficient Inference

19개의 포스트

[논문리뷰] A Frame is Worth One Token: Efficient Generative World Modeling with Delta Tokens

저자들은 비디오 프레임 전체를 모델링하는 대신, 프레임 간의 '변화(Delta)'만을 압축하는 DeltaTok과 이를 기반으로 생성적 추론을 수행하는 DeltaWorld를 제안합니다. DeltaTok은 이전 프레임의 특징을 바탕으로 현재 프레임과의 차이를 단일 토큰으로 인코딩하여 비디오를 순수 시간적 시퀀스로 변환합니다 .

#Review #Generative World Modeling #Delta Tokens #Visual Tokenization #Vision Foundation Models #Best-of-Many Training #Spatio-temporal Redundancy #Efficient Inference

2026년 4월 8일

[논문리뷰] ResAdapt: Adaptive Resolution for Efficient Multimodal Reasoning

Shizhu He이 arXiv에 게시한 'ResAdapt: Adaptive Resolution for Efficient Multimodal Reasoning' 논문에 대한 자세한 리뷰입니다.

#Review #Multimodal Large Language Models (MLLMs)#Input-side Adaptation #Contextual Bandit #Cost-Aware Policy Optimization (CAPO)#Visual Budgeting #Efficient Inference #Temporal Reasoning

2026년 3월 30일

[논문리뷰] On-Policy Self-Distillation for Reasoning Compression

Zhipeng Wang이 arXiv에 게시한 'On-Policy Self-Distillation for Reasoning Compression' 논문에 대한 자세한 리뷰입니다.

#Review #Reasoning Compression #Self-Distillation #On-Policy Learning #Large Language Models #Mathematical Reasoning #Knowledge Distillation #Efficient Inference

2026년 3월 5일

[논문리뷰] dLLM: Simple Diffusion Language Modeling

arXiv에 게시된 'dLLM: Simple Diffusion Language Modeling' 논문에 대한 자세한 리뷰입니다.

#Review #Diffusion Language Models #Open-source Framework #Modular Design #Masked Diffusion #Block Diffusion #Language Model Finetuning #Efficient Inference #Evaluation Pipeline

2026년 3월 1일

[논문리뷰] Does Your Reasoning Model Implicitly Know When to Stop Thinking?

arXiv에 게시된 'Does Your Reasoning Model Implicitly Know When to Stop Thinking?' 논문에 대한 자세한 리뷰입니다.

#Review #Large Reasoning Models #Chain of Thought #Efficient Inference #Self-Aware Sampling #Reinforcement Learning #Reasoning Termination #Mathematical Benchmarks

2026년 2월 22일

[논문리뷰] LTX-2: Efficient Joint Audio-Visual Foundation Model

Andrew Kvochko이 arXiv에 게시한 'LTX-2: Efficient Joint Audio-Visual Foundation Model' 논문에 대한 자세한 리뷰입니다.

#Review #Multimodal AI #Text-to-Audio-Video #Diffusion Transformer #Cross-Modal Attention #Classifier-Free Guidance #Efficient Inference #Foundation Model

2026년 1월 6일

[논문리뷰] HyperVL: An Efficient and Dynamic Multimodal Large Language Model for Edge Devices

Yuhang Dong이 arXiv에 게시한 'HyperVL: An Efficient and Dynamic Multimodal Large Language Model for Edge Devices' 논문에 대한 자세한 리뷰입니다.

#Review #Multimodal Large Language Model #Edge AI #Efficient Inference #Visual Resolution Compressor #Dual Consistency Learning #Vision Transformer #Quantization #Low-Latency

2025년 12월 17일

[논문리뷰] TwinFlow: Realizing One-step Generation on Large Models with Self-adversarial Flows

arXiv에 게시된 'TwinFlow: Realizing One-step Generation on Large Models with Self-adversarial Flows' 논문에 대한 자세한 리뷰입니다.

#Review #Generative Models #One-step Generation #Self-Adversarial Learning #Flow Matching #Large Language Models #Text-to-Image #Efficient Inference #Diffusion Models

2025년 12월 7일

[논문리뷰] LFM2 Technical Report

arXiv에 게시된 'LFM2 Technical Report' 논문에 대한 자세한 리뷰입니다.

#Review #Edge AI #Foundation Models #Hybrid Architecture #Knowledge Distillation #Multimodal AI #On-device Deployment #Efficient Inference #LLM Optimization

2025년 12월 1일

[논문리뷰] HunyuanVideo 1.5 Technical Report

Fang Yang이 arXiv에 게시한 'HunyuanVideo 1.5 Technical Report' 논문에 대한 자세한 리뷰입니다.

#Review #Video Generation #Diffusion Transformer #Sparse Attention #Super-Resolution #Open-Source #Multimodal Understanding #Training Optimization #Efficient Inference

2025년 11월 24일

[논문리뷰] NVIDIA Nemotron Nano V2 VL

arXiv에 게시된 'NVIDIA Nemotron Nano V2 VL' 논문에 대한 자세한 리뷰입니다.

#Review #Vision-Language Model #Hybrid Architecture #Mamba-Transformer #Long-Context Understanding #Quantization #Efficient Inference #Document AI #Video AI

2025년 11월 9일

[논문리뷰] Kimi Linear: An Expressive, Efficient Attention Architecture

arXiv에 게시된 'Kimi Linear: An Expressive, Efficient Attention Architecture' 논문에 대한 자세한 리뷰입니다.

#Review #Linear Attention #Hybrid Architecture #Kimi Delta Attention (KDA)#Gating Mechanism #Long-Context Modeling #Efficient Inference #Transformer

2025년 10월 31일

[논문리뷰] RegionE: Adaptive Region-Aware Generation for Efficient Image Editing

Peng Ye이 arXiv에 게시한 'RegionE: Adaptive Region-Aware Generation for Efficient Image Editing' 논문에 대한 자세한 리뷰입니다.

#Review #Instruction-based Image Editing #Diffusion Models #Efficient Inference #Region-Aware Generation #Adaptive Caching #Spatial Redundancy #Temporal Redundancy

2025년 10월 30일

[논문리뷰] SANA-Video: Efficient Video Generation with Block Linear Diffusion Transformer

arXiv에 게시된 'SANA-Video: Efficient Video Generation with Block Linear Diffusion Transformer' 논문에 대한 자세한 리뷰입니다.

#Review #Video Generation #Diffusion Model #Linear Attention #Transformer #Long Video #Efficient Inference #Constant Memory #Low-Cost Training #RTX Deployment

2025년 9월 30일

[논문리뷰] Quantized Visual Geometry Grounded Transformer

Yuqi Li이 arXiv에 게시한 'Quantized Visual Geometry Grounded Transformer' 논문에 대한 자세한 리뷰입니다.

#Review #Quantization #Post-Training Quantization #3D Reconstruction #Visual Transformer #Model Compression #Efficient Inference #Hadamard Rotation #Calibration Sampling

2025년 9월 26일

[논문리뷰] MiniCPM-V 4.5: Cooking Efficient MLLMs via Architecture, Data, and Training Recipe

Wenshuo Ma이 arXiv에 게시한 'MiniCPM-V 4.5: Cooking Efficient MLLMs via Architecture, Data, and Training Recipe' 논문에 대한 자세한 리뷰입니다.

#Review #MLLM Efficiency #Multimodal Transformer #3D-Resampler #Document AI #Hybrid Reinforcement Learning #Video Understanding #Efficient Inference

2025년 9월 24일

[논문리뷰] Causal Attention with Lookahead Keys

Quanquan Gu이 arXiv에 게시한 'Causal Attention with Lookahead Keys' 논문에 대한 자세한 리뷰입니다.

#Review #Causal Attention #Lookahead Keys #Autoregressive Modeling #Language Models #Transformer #Perplexity Reduction #Parallel Training #Efficient Inference

2025년 9월 10일

[논문리뷰] UltraMemV2: Memory Networks Scaling to 120B Parameters with Superior Long-Context Learning

Ran Guo이 arXiv에 게시한 'UltraMemV2: Memory Networks Scaling to 120B Parameters with Superior Long-Context Learning' 논문에 대한 자세한 리뷰입니다.

#Review #Memory Networks #Mixture of Experts (MoE)#Long-Context Learning #Sparse Models #Transformer Architecture #LLMs #Efficient Inference

2025년 8월 27일

[논문리뷰] MiDashengLM: Efficient Audio Understanding with General Audio Captions

Yadong Niu이 arXiv에 게시한 'MiDashengLM: Efficient Audio Understanding with General Audio Captions' 논문에 대한 자세한 리뷰입니다.

#Review #Audio-Language Model #General Audio Captions #Audio Understanding #Speech Recognition #Efficient Inference #Public Datasets #Multimodality #Data Curation

2025년 8월 7일