ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2002.10941
  4. Cited By
A$^3$: Accelerating Attention Mechanisms in Neural Networks with
  Approximation

A3^33: Accelerating Attention Mechanisms in Neural Networks with Approximation

22 February 2020
Tae Jun Ham
Sungjun Jung
Seonghak Kim
Young H. Oh
Yeonhong Park
Yoonho Song
Jung-Hun Park
Sanghee Lee
Kyoung Park
Jae W. Lee
D. Jeong
ArXivPDFHTML

Papers citing "A$^3$: Accelerating Attention Mechanisms in Neural Networks with Approximation"

50 / 76 papers shown
Title
LightNobel: Improving Sequence Length Limitation in Protein Structure Prediction Model via Adaptive Activation Quantization
LightNobel: Improving Sequence Length Limitation in Protein Structure Prediction Model via Adaptive Activation Quantization
Seunghee Han
S. Choi
J. Kim
26
0
0
09 May 2025
Uni-Render: A Unified Accelerator for Real-Time Rendering Across Diverse Neural Renderers
Uni-Render: A Unified Accelerator for Real-Time Rendering Across Diverse Neural Renderers
Chaojian Li
Sixu Li
Linrui Jiang
Jingqun Zhang
Yingyan Lin
39
0
0
31 Mar 2025
A Low-Power Streaming Speech Enhancement Accelerator For Edge Devices
A Low-Power Streaming Speech Enhancement Accelerator For Edge Devices
Ci-Hao Wu
Tian-Sheuan Chang
61
1
0
27 Mar 2025
QuartDepth: Post-Training Quantization for Real-Time Depth Estimation on the Edge
QuartDepth: Post-Training Quantization for Real-Time Depth Estimation on the Edge
Xuan Shen
Weize Ma
Jing Liu
Changdi Yang
Rui Ding
...
Wei Niu
Yanzhi Wang
Pu Zhao
Jun Lin
Jiuxiang Gu
MQ
57
0
0
20 Mar 2025
Attention Condensation via Sparsity Induced Regularized Training
Eli Sason
Darya Frolova
Boris Nazarov
Felix Goldberd
175
0
0
03 Mar 2025
PAPI: Exploiting Dynamic Parallelism in Large Language Model Decoding with a Processing-In-Memory-Enabled Computing System
PAPI: Exploiting Dynamic Parallelism in Large Language Model Decoding with a Processing-In-Memory-Enabled Computing System
Yintao He
Haiyu Mao
Christina Giannoula
Mohammad Sadrosadati
Juan Gómez Luna
Huawei Li
Xiaowei Li
Ying Wang
O. Mutlu
41
5
0
21 Feb 2025
Top-Theta Attention: Sparsifying Transformers by Compensated Thresholding
Top-Theta Attention: Sparsifying Transformers by Compensated Thresholding
Konstantin Berestizshevsky
Renzo Andri
Lukas Cavigelli
80
1
0
12 Feb 2025
EXION: Exploiting Inter- and Intra-Iteration Output Sparsity for Diffusion Models
EXION: Exploiting Inter- and Intra-Iteration Output Sparsity for Diffusion Models
Jaehoon Heo
Adiwena Putra
Jieon Yoon
Sungwoong Yune
Hangyeol Lee
Ji-Hoon Kim
Joo-Young Kim
DiffM
55
1
0
10 Jan 2025
Deploying Foundation Model Powered Agent Services: A Survey
Deploying Foundation Model Powered Agent Services: A Survey
Wenchao Xu
Jinyu Chen
Peirong Zheng
Xiaoquan Yi
Tianyi Tian
...
Quan Wan
Haozhao Wang
Yunfeng Fan
Qinliang Su
Xuemin Shen
AI4CE
119
1
0
18 Dec 2024
EXAQ: Exponent Aware Quantization For LLMs Acceleration
EXAQ: Exponent Aware Quantization For LLMs Acceleration
Moran Shkolnik
Maxim Fishman
Brian Chmiel
Hilla Ben-Yaacov
Ron Banner
Kfir Y. Levy
MQ
18
0
0
04 Oct 2024
FAMOUS: Flexible Accelerator for the Attention Mechanism of Transformer
  on UltraScale+ FPGAs
FAMOUS: Flexible Accelerator for the Attention Mechanism of Transformer on UltraScale+ FPGAs
Ehsan Kabir
Md. Arafat Kabir
Austin R. J. Downey
Jason D. Bakos
David Andrews
Miaoqing Huang
GNN
26
0
0
21 Sep 2024
ProTEA: Programmable Transformer Encoder Acceleration on FPGA
ProTEA: Programmable Transformer Encoder Acceleration on FPGA
Ehsan Kabir
Jason D. Bakos
David Andrews
Miaoqing Huang
14
0
0
21 Sep 2024
An Analog and Digital Hybrid Attention Accelerator for Transformers with
  Charge-based In-memory Computing
An Analog and Digital Hybrid Attention Accelerator for Transformers with Charge-based In-memory Computing
Ashkan Moradifirouzabadi
Divya Sri Dodla
Mingu Kang
19
0
0
08 Sep 2024
Hardware Acceleration of LLMs: A comprehensive survey and comparison
Hardware Acceleration of LLMs: A comprehensive survey and comparison
Nikoletta Koilia
C. Kachris
50
5
0
05 Sep 2024
Duplex: A Device for Large Language Models with Mixture of Experts,
  Grouped Query Attention, and Continuous Batching
Duplex: A Device for Large Language Models with Mixture of Experts, Grouped Query Attention, and Continuous Batching
Sungmin Yun
Kwanhee Kyung
Juhwan Cho
Jaewan Choi
Jongmin Kim
Byeongho Kim
Sukhan Lee
Kyomin Sohn
Jung Ho Ahn
MoE
36
5
0
02 Sep 2024
LLMServingSim: A HW/SW Co-Simulation Infrastructure for LLM Inference
  Serving at Scale
LLMServingSim: A HW/SW Co-Simulation Infrastructure for LLM Inference Serving at Scale
Jaehong Cho
Minsu Kim
Hyunmin Choi
Guseul Heo
Jongse Park
38
9
0
10 Aug 2024
1.5-Pints Technical Report: Pretraining in Days, Not Months -- Your
  Language Model Thrives on Quality Data
1.5-Pints Technical Report: Pretraining in Days, Not Months -- Your Language Model Thrives on Quality Data
Calvin Tan
Jerome Wang
ALM
38
2
0
07 Aug 2024
KWT-Tiny: RISC-V Accelerated, Embedded Keyword Spotting Transformer
KWT-Tiny: RISC-V Accelerated, Embedded Keyword Spotting Transformer
Aness Al-Qawlaq
Ajay Kumar
Deepu John
27
0
0
22 Jul 2024
Hybrid Dynamic Pruning: A Pathway to Efficient Transformer Inference
Hybrid Dynamic Pruning: A Pathway to Efficient Transformer Inference
Ghadeer Jaradat
M. Tolba
Ghada Alsuhli
Hani Saleh
Mahmoud Al-Qutayri
Thanos Stouraitis
Baker Mohammad
37
0
0
17 Jul 2024
Co-Designing Binarized Transformer and Hardware Accelerator for
  Efficient End-to-End Edge Deployment
Co-Designing Binarized Transformer and Hardware Accelerator for Efficient End-to-End Edge Deployment
Yuhao Ji
Chao Fang
Shaobo Ma
Haikuo Shao
Zhongfeng Wang
MQ
39
1
0
16 Jul 2024
MoA: Mixture of Sparse Attention for Automatic Large Language Model
  Compression
MoA: Mixture of Sparse Attention for Automatic Large Language Model Compression
Tianyu Fu
Haofeng Huang
Xuefei Ning
Genghan Zhang
Boju Chen
...
Shiyao Li
Shengen Yan
Guohao Dai
Huazhong Yang
Yu Wang
MQ
44
17
0
21 Jun 2024
LARS-VSA: A Vector Symbolic Architecture For Learning with Abstract
  Rules
LARS-VSA: A Vector Symbolic Architecture For Learning with Abstract Rules
Mohamed Mejri
C. Amarnath
Abhijit Chatterjee
41
1
0
23 May 2024
The CAP Principle for LLM Serving: A Survey of Long-Context Large
  Language Model Serving
The CAP Principle for LLM Serving: A Survey of Long-Context Large Language Model Serving
Pai Zeng
Zhenyu Ning
Jieru Zhao
Weihao Cui
Mengwei Xu
Liwei Guo
Xusheng Chen
Yizhou Shan
LLMAG
40
4
0
18 May 2024
From Algorithm to Hardware: A Survey on Efficient and Safe Deployment of
  Deep Neural Networks
From Algorithm to Hardware: A Survey on Efficient and Safe Deployment of Deep Neural Networks
Xue Geng
Zhe Wang
Chunyun Chen
Qing Xu
Kaixin Xu
...
Zhenghua Chen
M. Aly
Jie Lin
Min-man Wu
Xiaoli Li
33
1
0
09 May 2024
QServe: W4A8KV4 Quantization and System Co-design for Efficient LLM Serving
QServe: W4A8KV4 Quantization and System Co-design for Efficient LLM Serving
Yujun Lin
Haotian Tang
Shang Yang
Zhekai Zhang
Guangxuan Xiao
Chuang Gan
Song Han
77
76
0
07 May 2024
LATTE: Low-Precision Approximate Attention with Head-wise Trainable
  Threshold for Efficient Transformer
LATTE: Low-Precision Approximate Attention with Head-wise Trainable Threshold for Efficient Transformer
Jiing-Ping Wang
Ming-Guang Lin
An-Yeu Wu
Wu
24
1
0
11 Apr 2024
Lightweight Deep Learning for Resource-Constrained Environments: A
  Survey
Lightweight Deep Learning for Resource-Constrained Environments: A Survey
Hou-I Liu
Marco Galindo
Hongxia Xie
Lai-Kuan Wong
Hong-Han Shuai
Yung-Hui Li
Wen-Huang Cheng
52
48
0
08 Apr 2024
CHAI: Clustered Head Attention for Efficient LLM Inference
CHAI: Clustered Head Attention for Efficient LLM Inference
Saurabh Agarwal
Bilge Acun
Basil Homer
Mostafa Elhoushi
Yejin Lee
Shivaram Venkataraman
Dimitris Papailiopoulos
Carole-Jean Wu
53
8
0
12 Mar 2024
Stochastic Spiking Attention: Accelerating Attention with Stochastic
  Computing in Spiking Networks
Stochastic Spiking Attention: Accelerating Attention with Stochastic Computing in Spiking Networks
Zihang Song
Prabodh Katti
Osvaldo Simeone
Bipin Rajendran
16
2
0
14 Feb 2024
Compressing Deep Reinforcement Learning Networks with a Dynamic
  Structured Pruning Method for Autonomous Driving
Compressing Deep Reinforcement Learning Networks with a Dynamic Structured Pruning Method for Autonomous Driving
Wensheng Su
Zhenni Li
Minrui Xu
Jiawen Kang
Dusit Niyato
Shengli Xie
15
8
0
07 Feb 2024
ConSmax: Hardware-Friendly Alternative Softmax with Learnable Parameters
ConSmax: Hardware-Friendly Alternative Softmax with Learnable Parameters
Shiwei Liu
Guanchen Tao
Yifei Zou
Derek Chow
Zichen Fan
Kauna Lei
Bangfei Pan
Dennis Sylvester
Gregory Kielian
Mehdi Saligane
21
7
0
31 Jan 2024
BETA: Binarized Energy-Efficient Transformer Accelerator at the Edge
BETA: Binarized Energy-Efficient Transformer Accelerator at the Edge
Yuhao Ji
Chao Fang
Zhongfeng Wang
30
3
0
22 Jan 2024
A Survey on Hardware Accelerators for Large Language Models
A Survey on Hardware Accelerators for Large Language Models
C. Kachris
31
14
0
18 Jan 2024
FlightLLM: Efficient Large Language Model Inference with a Complete
  Mapping Flow on FPGAs
FlightLLM: Efficient Large Language Model Inference with a Complete Mapping Flow on FPGAs
Shulin Zeng
Jun Liu
Guohao Dai
Xinhao Yang
Tianyu Fu
...
Zehao Wang
Ruoyu Zhang
Kairui Wen
Xuefei Ning
Yu Wang
54
55
0
08 Jan 2024
A Heterogeneous Chiplet Architecture for Accelerating End-to-End
  Transformer Models
A Heterogeneous Chiplet Architecture for Accelerating End-to-End Transformer Models
Harsh Sharma
Pratyush Dhingra
J. Doppa
Ümit Y. Ogras
P. Pande
32
7
0
18 Dec 2023
The Efficiency Spectrum of Large Language Models: An Algorithmic Survey
The Efficiency Spectrum of Large Language Models: An Algorithmic Survey
Tianyu Ding
Tianyi Chen
Haidong Zhu
Jiachen Jiang
Yiqi Zhong
Jinxin Zhou
Guangzhi Wang
Zhihui Zhu
Ilya Zharkov
Luming Liang
27
22
0
01 Dec 2023
Sparse-DySta: Sparsity-Aware Dynamic and Static Scheduling for Sparse
  Multi-DNN Workloads
Sparse-DySta: Sparsity-Aware Dynamic and Static Scheduling for Sparse Multi-DNN Workloads
Hongxiang Fan
Stylianos I. Venieris
Alexandros Kouris
Nicholas D. Lane
15
7
0
17 Oct 2023
Chameleon: a Heterogeneous and Disaggregated Accelerator System for Retrieval-Augmented Language Models
Chameleon: a Heterogeneous and Disaggregated Accelerator System for Retrieval-Augmented Language Models
Wenqi Jiang
Marco Zeller
R. Waleffe
Torsten Hoefler
Gustavo Alonso
47
16
0
15 Oct 2023
A Survey of Techniques for Optimizing Transformer Inference
A Survey of Techniques for Optimizing Transformer Inference
Krishna Teja Chitty-Venkata
Sparsh Mittal
M. Emani
V. Vishwanath
Arun Somani
37
62
0
16 Jul 2023
NeuralMatrix: Compute the Entire Neural Networks with Linear Matrix
  Operations for Efficient Inference
NeuralMatrix: Compute the Entire Neural Networks with Linear Matrix Operations for Efficient Inference
Ruiqi Sun
Siwei Ye
Jie Zhao
Xin He
Yiran Li
An Zou
35
0
0
23 May 2023
Towards Efficient Multi-Agent Learning Systems
Towards Efficient Multi-Agent Learning Systems
Kailash Gogineni
Peng Wei
Tian-Shing Lan
Guru Venkataramani
21
4
0
22 May 2023
Integer or Floating Point? New Outlooks for Low-Bit Quantization on
  Large Language Models
Integer or Floating Point? New Outlooks for Low-Bit Quantization on Large Language Models
Yijia Zhang
Lingran Zhao
Shijie Cao
Wenqiang Wang
Ting Cao
Fan Yang
Mao Yang
Shanghang Zhang
Ningyi Xu
MQ
25
17
0
21 May 2023
SwiftTron: An Efficient Hardware Accelerator for Quantized Transformers
SwiftTron: An Efficient Hardware Accelerator for Quantized Transformers
Alberto Marchisio
David Durà
Maurizio Capra
Maurizio Martina
Guido Masera
Muhammad Shafique
26
17
0
08 Apr 2023
TransCODE: Co-design of Transformers and Accelerators for Efficient
  Training and Inference
TransCODE: Co-design of Transformers and Accelerators for Efficient Training and Inference
Shikhar Tuli
N. Jha
30
5
0
27 Mar 2023
X-Former: In-Memory Acceleration of Transformers
X-Former: In-Memory Acceleration of Transformers
S. Sridharan
Jacob R. Stevens
Kaushik Roy
A. Raghunathan
GNN
18
33
0
13 Mar 2023
AccelTran: A Sparsity-Aware Accelerator for Dynamic Inference with
  Transformers
AccelTran: A Sparsity-Aware Accelerator for Dynamic Inference with Transformers
Shikhar Tuli
N. Jha
25
31
0
28 Feb 2023
Full Stack Optimization of Transformer Inference: a Survey
Full Stack Optimization of Transformer Inference: a Survey
Sehoon Kim
Coleman Hooper
Thanakul Wattanawong
Minwoo Kang
Ruohan Yan
...
Qijing Huang
Kurt Keutzer
Michael W. Mahoney
Y. Shao
A. Gholami
MQ
28
100
0
27 Feb 2023
Soft Error Reliability Analysis of Vision Transformers
Soft Error Reliability Analysis of Vision Transformers
Xing-xiong Xue
Cheng Liu
Ying Wang
Bing Yang
Tao Luo
L. Zhang
Huawei Li
Xiaowei Li
34
14
0
21 Feb 2023
A Survey on Efficient Training of Transformers
A Survey on Efficient Training of Transformers
Bohan Zhuang
Jing Liu
Zizheng Pan
Haoyu He
Yuetian Weng
Chunhua Shen
20
47
0
02 Feb 2023
AttMEMO : Accelerating Transformers with Memoization on Big Memory
  Systems
AttMEMO : Accelerating Transformers with Memoization on Big Memory Systems
Yuan Feng
Hyeran Jeon
F. Blagojevic
Cyril Guyot
Qing Li
Dong Li
GNN
19
3
0
23 Jan 2023
12
Next