#Inference Optimization

14개의 포스트

[논문리뷰] Compiler-First State Space Duality and Portable O(1) Autoregressive Caching for Inference

arXiv에 게시된 'Compiler-First State Space Duality and Portable O(1) Autoregressive Caching for Inference' 논문에 대한 자세한 리뷰입니다.

#Review #State Space Models #Mamba-2 #XLA #JAX #Compiler Codegen #Autoregressive Caching #Hardware Portability #Inference Optimization

2026년 3월 10일

[논문리뷰] Scaling Embeddings Outperforms Scaling Experts in Language Models

arXiv에 게시된 'Scaling Embeddings Outperforms Scaling Experts in Language Models' 논문에 대한 자세한 리뷰입니다.

#Review #Embedding Scaling #N-gram Embedding #Mixture-of-Experts (MoE)#Large Language Models (LLMs)#Parameter Efficiency #Inference Optimization #Speculative Decoding

2026년 1월 29일

[논문리뷰] Sliding Window Attention Adaptation

arXiv에 게시된 'Sliding Window Attention Adaptation' 논문에 대한 자세한 리뷰입니다.

#Review #Large Language Models #Sliding Window Attention #Model Adaptation #Long Context #Inference Optimization #Fine-tuning #Chain-of-Thought #Sparse Attention

2025년 12월 14일

[논문리뷰] Learning Unmasking Policies for Diffusion Language Models

arXiv에 게시된 'Learning Unmasking Policies for Diffusion Language Models' 논문에 대한 자세한 리뷰입니다.

#Review #Diffusion Language Models #Reinforcement Learning #Masked Diffusion #Sampling Policy #Inference Optimization #Markov Decision Process #Generative AI #Text Generation

2025년 12월 10일

[논문리뷰] The Art of Scaling Test-Time Compute for Large Language Models

Tanmoy Chakraborty이 arXiv에 게시한 'The Art of Scaling Test-Time Compute for Large Language Models' 논문에 대한 자세한 리뷰입니다.

#Review #Test-Time Scaling #LLMs #Reasoning #Compute Efficiency #Inference Optimization #Decoding Strategies #Model Behavior

2025년 12월 1일

[논문리뷰] Optimizing Diversity and Quality through Base-Aligned Model Collaboration

Jonathan May이 arXiv에 게시한 'Optimizing Diversity and Quality through Base-Aligned Model Collaboration' 논문에 대한 자세한 리뷰입니다.

#Review #Large Language Models #Generative AI #Diversity-Quality Trade-off #Model Collaboration #Inference Optimization #Routing Strategy #Text Generation

2025년 11월 11일

[논문리뷰] LUT-LLM: Efficient Large Language Model Inference with Memory-based Computations on FPGAs

Jason Cong이 arXiv에 게시한 'LUT-LLM: Efficient Large Language Model Inference with Memory-based Computations on FPGAs' 논문에 대한 자세한 리뷰입니다.

#Review #FPGA #Large Language Models (LLM)#Inference Optimization #Memory-based Computation #Vector Quantization #Table Lookup #Hardware Acceleration

2025년 11월 10일

[논문리뷰] The Sequential Edge: Inverse-Entropy Voting Beats Parallel Self-Consistency at Matched Compute

arXiv에 게시된 'The Sequential Edge: Inverse-Entropy Voting Beats Parallel Self-Consistency at Matched Compute' 논문에 대한 자세한 리뷰입니다.

#Review #Sequential Reasoning #Parallel Self-Consistency #Inverse-Entropy Voting #LLM Reasoning #Test-Time Scaling #Inference Optimization #Iterative Refinement #Error Correction

2025년 11월 9일

[논문리뷰] BitNet Distillation

arXiv에 게시된 'BitNet Distillation' 논문에 대한 자세한 리뷰입니다.

#Review #Low-bit Quantization #LLM Compression #Knowledge Distillation #Ternary Weights #Inference Optimization #Memory Efficiency #SubLN #Continual Pre-training

2025년 10월 17일

[논문리뷰] Attention Is All You Need for KV Cache in Diffusion LLMs

arXiv에 게시된 'Attention Is All You Need for KV Cache in Diffusion LLMs' 논문에 대한 자세한 리뷰입니다.

#Review #Diffusion LLMs #KV Cache #Adaptive Caching #Inference Optimization #Attention Mechanism #Latency Reduction #Generative AI

2025년 10월 17일

[논문리뷰] ViCO: A Training Strategy towards Semantic Aware Dynamic High-Resolution

arXiv에 게시된 'ViCO: A Training Strategy towards Semantic Aware Dynamic High-Resolution' 논문에 대한 자세한 리뷰입니다.

#Review #Multimodal Large Language Models (MLLMs)#Dynamic Resolution #Token Compression #Semantic Awareness #Visual Consistency Learning (ViCO)#Visual Resolution Router (ViR)#Inference Optimization

2025년 10월 15일

[논문리뷰] EasySteer: A Unified Framework for High-Performance and Extensible LLM Steering

arXiv에 게시된 'EasySteer: A Unified Framework for High-Performance and Extensible LLM Steering' 논문에 대한 자세한 리뷰입니다.

#Review #LLM Steering Framework #vLLM Integration #Hidden State Manipulation #Inference Optimization #Extensibility #Modular Architecture #Reasoning Mitigation #Hallucination Reduction

2025년 9월 30일

[논문리뷰] A Survey on Diffusion Language Models

Zhiqiang Shen이 arXiv에 게시한 'A Survey on Diffusion Language Models' 논문에 대한 자세한 리뷰입니다.

#Review #Diffusion Language Models #Generative AI #Parallel Decoding #Text Generation #Multimodal AI #Model Compression #Reinforcement Learning from Human Feedback #Inference Optimization

2025년 8월 15일

[논문리뷰] Beyond Fixed: Variable-Length Denoising for Diffusion Large Language Models

Jiaqi Wang이 arXiv에 게시한 'Beyond Fixed: Variable-Length Denoising for Diffusion Large Language Models' 논문에 대한 자세한 리뷰입니다.

#Review #Diffusion Large Language Models #Variable-Length Generation #Dynamic Length Adaptation #Denoising Strategy #Inference Optimization #Computational Efficiency

2025년 8월 4일