#Hardware Acceleration

4개의 포스트

[논문리뷰] OmniMoE: An Efficient MoE by Orchestrating Atomic Experts at Scale

본 논문은 MoE 아키텍처에서 전문가 전문화의 세분성과 하드웨어 실행 효율성 사이의 본질적인 trade-off를 해결하는 것을 목표로 합니다.

#Review #Mixture-of-Experts (MoE)#Fine-Grained Experts #Efficient Architectures #Transformer #Routing Algorithms #Hardware Acceleration #Sparse Models

2026년 2월 8일

[논문리뷰] Towards Automated Kernel Generation in the Era of LLMs

본 논문은 현대 AI 시스템의 성능을 근본적으로 제한하는 고성능 커널 생성 및 최적화의 비확장성 문제 를 해결하고자 합니다.

#Review #Large Language Models #Kernel Generation #GPU Optimization #AI Agents #Code Synthesis #Performance Engineering #Hardware Acceleration

2026년 1월 22일

[논문리뷰] SQ-format: A Unified Sparse-Quantized Hardware-friendly Data Format for LLMs

대규모 언어 모델(LLMs)의 배포에 있어 저비트 양자화(low-bit quantization) 와 희소화(sparsification) 기술이 정확도와 효율성 사이에서 균형을 맞추기 어려운 문제를 해결하는 것이 목표입니다.

#Review #LLM Quantization #Sparsification #Hardware Acceleration #Mixed-Precision #Post-Training Quantization #Data Format #GPU Optimization #AI Accelerator

2025년 12월 7일

[논문리뷰] LUT-LLM: Efficient Large Language Model Inference with Memory-based Computations on FPGAs

본 논문은 효율적인 단일 배치 대규모 언어 모델(LLM) 추론을 위해 FPGA 의 장점을 활용하는 것을 목표로 합니다. 특히, 기존 산술 기반 연산에서 메모리 기반 연산 으로 전환하여 GPU 대비 FPGA의 성능 및 에너지 효율성 한계를 극복하고, 온디바이스 AI 구현을 위한 핵심 기술을 개발하고자 합니다.

#Review #FPGA #Large Language Models (LLM)#Inference Optimization #Memory-based Computation #Vector Quantization #Table Lookup #Hardware Acceleration

2025년 11월 10일