ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2307.07982
  4. Cited By
A Survey of Techniques for Optimizing Transformer Inference

A Survey of Techniques for Optimizing Transformer Inference

16 July 2023
Krishna Teja Chitty-Venkata
Sparsh Mittal
M. Emani
V. Vishwanath
Arun Somani
ArXivPDFHTML

Papers citing "A Survey of Techniques for Optimizing Transformer Inference"

48 / 48 papers shown
Title
PHEONA: An Evaluation Framework for Large Language Model-based Approaches to Computational Phenotyping
PHEONA: An Evaluation Framework for Large Language Model-based Approaches to Computational Phenotyping
Sarah Pungitore
Shashank Yadav
Vignesh Subbian
LM&MA
51
0
0
25 Mar 2025
Energy-Efficient Transformer Inference: Optimization Strategies for Time Series Classification
Energy-Efficient Transformer Inference: Optimization Strategies for Time Series Classification
Arshia Kermani
Ehsan Zeraatkar
Habib Irani
28
2
0
23 Feb 2025
A Comparative Study of Pruning Methods in Transformer-based Time Series
  Forecasting
A Comparative Study of Pruning Methods in Transformer-based Time Series Forecasting
Nicholas Kiefer
Arvid Weyrauch
Muhammed Öz
Achim Streit
Markus Gotz
Charlotte Debus
AI4TS
65
0
0
17 Dec 2024
MPQ-DM: Mixed Precision Quantization for Extremely Low Bit Diffusion
  Models
MPQ-DM: Mixed Precision Quantization for Extremely Low Bit Diffusion Models
Weilun Feng
Haotong Qin
Chuanguang Yang
Zhulin An
Libo Huang
Boyu Diao
Fei Wang
Renshuai Tao
Y. Xu
Michele Magno
DiffM
MQ
75
4
0
16 Dec 2024
Optimising TinyML with Quantization and Distillation of Transformer and
  Mamba Models for Indoor Localisation on Edge Devices
Optimising TinyML with Quantization and Distillation of Transformer and Mamba Models for Indoor Localisation on Edge Devices
Thanaphon Suwannaphong
Ferdian Jovan
I. Craddock
Ryan McConville
Mamba
61
1
0
12 Dec 2024
LLM-Inference-Bench: Inference Benchmarking of Large Language Models on
  AI Accelerators
LLM-Inference-Bench: Inference Benchmarking of Large Language Models on AI Accelerators
Krishna Teja Chitty-Venkata
Siddhisanket Raskar
B. Kale
Farah Ferdaus
Aditya Tanikanti
Ken Raffenetti
Valerie Taylor
M. Emani
V. Vishwanath
39
4
0
31 Oct 2024
BATON: Enhancing Batch-wise Inference Efficiency for Large Language
  Models via Dynamic Re-batching
BATON: Enhancing Batch-wise Inference Efficiency for Large Language Models via Dynamic Re-batching
Peizhuang Cong
Qizhi Chen
Haochen Zhao
Tong Yang
KELM
13
1
0
24 Oct 2024
LLM-Rank: A Graph Theoretical Approach to Pruning Large Language Models
LLM-Rank: A Graph Theoretical Approach to Pruning Large Language Models
David Hoffmann
Kailash Budhathoki
Matthaeus Kleindessner
22
0
0
17 Oct 2024
Resource-aware Mixed-precision Quantization for Enhancing Deployability
  of Transformers for Time-series Forecasting on Embedded FPGAs
Resource-aware Mixed-precision Quantization for Enhancing Deployability of Transformers for Time-series Forecasting on Embedded FPGAs
Tianheng Ling
Chao Qian
Gregor Schiele
13
0
0
04 Oct 2024
Protecting Vehicle Location Privacy with Contextually-Driven Synthetic
  Location Generation
Protecting Vehicle Location Privacy with Contextually-Driven Synthetic Location Generation
Sourabh Yadav
Chenyang Yu
Xinpeng Xie
Yan Huang
Chenxi Qiu
AAML
14
0
0
14 Sep 2024
Inference Optimization of Foundation Models on AI Accelerators
Inference Optimization of Foundation Models on AI Accelerators
Youngsuk Park
Kailash Budhathoki
Liangfu Chen
Jonas M. Kübler
Jiaji Huang
Matthäus Kleindessner
Jun Huan
V. Cevher
Yida Wang
George Karypis
29
2
0
12 Jul 2024
Memory Is All You Need: An Overview of Compute-in-Memory Architectures
  for Accelerating Large Language Model Inference
Memory Is All You Need: An Overview of Compute-in-Memory Architectures for Accelerating Large Language Model Inference
Christopher Wolters
Xiaoxuan Yang
Ulf Schlichtmann
Toyotaro Suzumura
18
11
0
12 Jun 2024
Mapping Parallel Matrix Multiplication in GotoBLAS2 to the AMD Versal
  ACAP for Deep Learning
Mapping Parallel Matrix Multiplication in GotoBLAS2 to the AMD Versal ACAP for Deep Learning
Jie Lei
Enrique S. Quintana-Ortí
16
1
0
23 Apr 2024
Accelerating ViT Inference on FPGA through Static and Dynamic Pruning
Accelerating ViT Inference on FPGA through Static and Dynamic Pruning
Dhruv Parikh
Shouyi Li
Bingyi Zhang
Rajgopal Kannan
Carl E. Busart
Viktor Prasanna
25
0
0
21 Mar 2024
Accurate LoRA-Finetuning Quantization of LLMs via Information Retention
Accurate LoRA-Finetuning Quantization of LLMs via Information Retention
Haotong Qin
Xudong Ma
Xingyu Zheng
Xiaoyang Li
Yang Zhang
Shouda Liu
Jie Luo
Xianglong Liu
Michele Magno
MQ
21
36
0
08 Feb 2024
QuEST: Low-bit Diffusion Model Quantization via Efficient Selective
  Finetuning
QuEST: Low-bit Diffusion Model Quantization via Efficient Selective Finetuning
Haoxuan Wang
Yuzhang Shang
Zhihang Yuan
Junyi Wu
Yan Yan
DiffM
MQ
11
28
0
06 Feb 2024
The Languini Kitchen: Enabling Language Modelling Research at Different
  Scales of Compute
The Languini Kitchen: Enabling Language Modelling Research at Different Scales of Compute
Aleksandar Stanić
Dylan R. Ashley
Oleg Serikov
Louis Kirsch
Francesco Faccio
Jürgen Schmidhuber
Thomas Hofmann
Imanol Schlag
MoE
25
9
0
20 Sep 2023
Making AI Less "Thirsty": Uncovering and Addressing the Secret Water Footprint of AI Models
Making AI Less "Thirsty": Uncovering and Addressing the Secret Water Footprint of AI Models
Pengfei Li
Jianyi Yang
M. A. Islam
Shaolei Ren
67
104
0
06 Apr 2023
PanGu-Σ: Towards Trillion Parameter Language Model with Sparse
  Heterogeneous Computing
PanGu-Σ: Towards Trillion Parameter Language Model with Sparse Heterogeneous Computing
Xiaozhe Ren
Pingyi Zhou
Xinfan Meng
Xinjing Huang
Yadao Wang
...
Jiansheng Wei
Xin Jiang
Teng Su
Qun Liu
Jun Yao
ALM
MoE
50
59
0
20 Mar 2023
BEBERT: Efficient and Robust Binary Ensemble BERT
BEBERT: Efficient and Robust Binary Ensemble BERT
Jiayi Tian
Chao Fang
Hong Wang
Zhongfeng Wang
MQ
14
16
0
28 Oct 2022
GLM-130B: An Open Bilingual Pre-trained Model
GLM-130B: An Open Bilingual Pre-trained Model
Aohan Zeng
Xiao Liu
Zhengxiao Du
Zihan Wang
Hanyu Lai
...
Jidong Zhai
Wenguang Chen
Peng-Zhen Zhang
Yuxiao Dong
Jie Tang
BDL
LRM
240
1,070
0
05 Oct 2022
Improving alignment of dialogue agents via targeted human judgements
Improving alignment of dialogue agents via targeted human judgements
Amelia Glaese
Nat McAleese
Maja Trkebacz
John Aslanides
Vlad Firoiu
...
John F. J. Mellor
Demis Hassabis
Koray Kavukcuoglu
Lisa Anne Hendricks
G. Irving
ALM
AAML
220
495
0
28 Sep 2022
Adaptive Sparse ViT: Towards Learnable Adaptive Token Pruning by Fully
  Exploiting Self-Attention
Adaptive Sparse ViT: Towards Learnable Adaptive Token Pruning by Fully Exploiting Self-Attention
Xiangcheng Liu
Tianyi Wu
Guodong Guo
ViT
33
26
0
28 Sep 2022
DFX: A Low-latency Multi-FPGA Appliance for Accelerating
  Transformer-based Text Generation
DFX: A Low-latency Multi-FPGA Appliance for Accelerating Transformer-based Text Generation
Seongmin Hong
Seungjae Moon
Junsoo Kim
Sungjae Lee
Minsub Kim
Dongsoo Lee
Joo-Young Kim
56
74
0
22 Sep 2022
PSAQ-ViT V2: Towards Accurate and General Data-Free Quantization for
  Vision Transformers
PSAQ-ViT V2: Towards Accurate and General Data-Free Quantization for Vision Transformers
Zhikai Li
Mengjuan Chen
Junrui Xiao
Qingyi Gu
ViT
MQ
37
31
0
13 Sep 2022
I-ViT: Integer-only Quantization for Efficient Vision Transformer
  Inference
I-ViT: Integer-only Quantization for Efficient Vision Transformer Inference
Zhikai Li
Qingyi Gu
MQ
36
94
0
04 Jul 2022
SALO: An Efficient Spatial Accelerator Enabling Hybrid Sparse Attention
  Mechanisms for Long Sequences
SALO: An Efficient Spatial Accelerator Enabling Hybrid Sparse Attention Mechanisms for Long Sequences
Guan Shen
Jieru Zhao
Quan Chen
Jingwen Leng
C. Li
Minyi Guo
24
26
0
29 Jun 2022
BiT: Robustly Binarized Multi-distilled Transformer
BiT: Robustly Binarized Multi-distilled Transformer
Zechun Liu
Barlas Oğuz
Aasish Pappu
Lin Xiao
Scott Yih
Meng Li
Raghuraman Krishnamoorthi
Yashar Mehdad
MQ
24
50
0
25 May 2022
Training language models to follow instructions with human feedback
Training language models to follow instructions with human feedback
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
...
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
OSLM
ALM
301
11,730
0
04 Mar 2022
Multi-Tailed Vision Transformer for Efficient Inference
Multi-Tailed Vision Transformer for Efficient Inference
Yunke Wang
Bo Du
Wenyuan Wang
Chang Xu
ViT
201
5
0
03 Mar 2022
MobileViT: Light-weight, General-purpose, and Mobile-friendly Vision
  Transformer
MobileViT: Light-weight, General-purpose, and Mobile-friendly Vision Transformer
Sachin Mehta
Mohammad Rastegari
ViT
178
1,148
0
05 Oct 2021
Primer: Searching for Efficient Transformers for Language Modeling
Primer: Searching for Efficient Transformers for Language Modeling
David R. So
Wojciech Mañke
Hanxiao Liu
Zihang Dai
Noam M. Shazeer
Quoc V. Le
VLM
83
149
0
17 Sep 2021
What Changes Can Large-scale Language Models Bring? Intensive Study on
  HyperCLOVA: Billions-scale Korean Generative Pretrained Transformers
What Changes Can Large-scale Language Models Bring? Intensive Study on HyperCLOVA: Billions-scale Korean Generative Pretrained Transformers
Boseop Kim
Hyoungseok Kim
Sang-Woo Lee
Gichang Lee
Donghyun Kwak
...
Jaewook Kang
Inho Kang
Jung-Woo Ha
W. Park
Nako Sung
VLM
224
121
0
10 Sep 2021
Mobile-Former: Bridging MobileNet and Transformer
Mobile-Former: Bridging MobileNet and Transformer
Yinpeng Chen
Xiyang Dai
Dongdong Chen
Mengchen Liu
Xiaoyi Dong
Lu Yuan
Zicheng Liu
ViT
161
462
0
12 Aug 2021
Dual Graph Convolutional Networks with Transformer and Curriculum
  Learning for Image Captioning
Dual Graph Convolutional Networks with Transformer and Curriculum Learning for Image Captioning
Xinzhi Dong
Chengjiang Long
Wenju Xu
Chunxia Xiao
ViT
69
66
0
05 Aug 2021
MLP-Mixer: An all-MLP Architecture for Vision
MLP-Mixer: An all-MLP Architecture for Vision
Ilya O. Tolstikhin
N. Houlsby
Alexander Kolesnikov
Lucas Beyer
Xiaohua Zhai
...
Andreas Steiner
Daniel Keysers
Jakob Uszkoreit
Mario Lucic
Alexey Dosovitskiy
239
2,554
0
04 May 2021
Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction
  without Convolutions
Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction without Convolutions
Wenhai Wang
Enze Xie
Xiang Li
Deng-Ping Fan
Kaitao Song
Ding Liang
Tong Lu
Ping Luo
Ling Shao
ViT
263
3,538
0
24 Feb 2021
Show, Attend and Distill:Knowledge Distillation via Attention-based
  Feature Matching
Show, Attend and Distill:Knowledge Distillation via Attention-based Feature Matching
Mingi Ji
Byeongho Heo
Sungrae Park
53
140
0
05 Feb 2021
Bottleneck Transformers for Visual Recognition
Bottleneck Transformers for Visual Recognition
A. Srinivas
Tsung-Yi Lin
Niki Parmar
Jonathon Shlens
Pieter Abbeel
Ashish Vaswani
SLR
262
955
0
27 Jan 2021
I-BERT: Integer-only BERT Quantization
I-BERT: Integer-only BERT Quantization
Sehoon Kim
A. Gholami
Z. Yao
Michael W. Mahoney
Kurt Keutzer
MQ
86
332
0
05 Jan 2021
BinaryBERT: Pushing the Limit of BERT Quantization
BinaryBERT: Pushing the Limit of BERT Quantization
Haoli Bai
Wei Zhang
Lu Hou
Lifeng Shang
Jing Jin
Xin Jiang
Qun Liu
Michael Lyu
Irwin King
MQ
138
183
0
31 Dec 2020
ShiftAddNet: A Hardware-Inspired Deep Network
ShiftAddNet: A Hardware-Inspired Deep Network
Haoran You
Xiaohan Chen
Yongan Zhang
Chaojian Li
Sicheng Li
Zihao Liu
Zhangyang Wang
Yingyan Lin
OOD
MQ
47
75
0
24 Oct 2020
Dynamic ReLU
Dynamic ReLU
Yinpeng Chen
Xiyang Dai
Mengchen Liu
Dongdong Chen
Lu Yuan
Zicheng Liu
153
157
0
22 Mar 2020
Forward and Backward Information Retention for Accurate Binary Neural
  Networks
Forward and Backward Information Retention for Accurate Binary Neural Networks
Haotong Qin
Ruihao Gong
Xianglong Liu
Mingzhu Shen
Ziran Wei
F. Yu
Jingkuan Song
MQ
117
321
0
24 Sep 2019
Q-BERT: Hessian Based Ultra Low Precision Quantization of BERT
Q-BERT: Hessian Based Ultra Low Precision Quantization of BERT
Sheng Shen
Zhen Dong
Jiayu Ye
Linjian Ma
Z. Yao
A. Gholami
Michael W. Mahoney
Kurt Keutzer
MQ
214
505
0
12 Sep 2019
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language
  Understanding
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding
Alex Jinpeng Wang
Amanpreet Singh
Julian Michael
Felix Hill
Omer Levy
Samuel R. Bowman
ELM
294
6,927
0
20 Apr 2018
MobileNets: Efficient Convolutional Neural Networks for Mobile Vision
  Applications
MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications
Andrew G. Howard
Menglong Zhu
Bo Chen
Dmitry Kalenichenko
Weijun Wang
Tobias Weyand
M. Andreetto
Hartwig Adam
3DH
948
20,214
0
17 Apr 2017
Neural Architecture Search with Reinforcement Learning
Neural Architecture Search with Reinforcement Learning
Barret Zoph
Quoc V. Le
264
5,290
0
05 Nov 2016
1