HLSTransform: Energy-Efficient Llama 2 Inference on FPGAs Via High Level Synthesis

29 April 2024

Papers citing "HLSTransform: Energy-Efficient Llama 2 Inference on FPGAs Via High Level Synthesis"

4 / 4 papers shown

Title
Large Language Model Inference Acceleration: A Comprehensive Hardware Perspective Jinhao Li Jiaming Xu Shan Huang Yonghua Chen Wen Li ... Jiayi Pan Li Ding Hao Zhou Yu Wang Guohao Dai 57 15 0 06 Oct 2024
Great Power, Great Responsibility: Recommendations for Reducing Energy for Training Language Models Joseph McDonald Baolin Li Nathan C. Frey Devesh Tiwari V. Gadepally S. Samsi 24 44 0 19 May 2022
I-BERT: Integer-only BERT Quantization Sehoon Kim A. Gholami Z. Yao Michael W. Mahoney Kurt Keutzer MQ 86 336 0 05 Jan 2021
Fast inference of deep neural networks in FPGAs for particle physics Javier Mauricio Duarte Song Han Philip C. Harris S. Jindariani E. Kreinar ... J. Ngadiuba M. Pierini R. Rivera N. Tran Zhenbin Wu AI4CE 75 386 0 16 Apr 2018