Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2112.02191
Cited By
NN-LUT: Neural Approximation of Non-Linear Operations for Efficient Transformer Inference
3 December 2021
Joonsang Yu
Junki Park
Seongmin Park
Minsoo Kim
Sihwa Lee
Dong Hyun Lee
Jungwook Choi
Re-assign community
ArXiv
PDF
HTML
Papers citing
"NN-LUT: Neural Approximation of Non-Linear Operations for Efficient Transformer Inference"
14 / 14 papers shown
Title
VEXP: A Low-Cost RISC-V ISA Extension for Accelerated Softmax Computation in Transformers
Run Wang
Gamze Islamoglu
Andrea Belano
Viviane Potocnik
Francesco Conti
Angelo Garofalo
Luca Benini
26
0
0
15 Apr 2025
LUT-DLA: Lookup Table as Efficient Extreme Low-Bit Deep Learning Accelerator
Guoyu Li
Shengyu Ye
C. L. P. Chen
Yang Wang
Fan Yang
Ting Cao
Cheng Liu
Mohamed M. Sabry
Mao Yang
MQ
128
0
0
18 Jan 2025
QuAKE: Speeding up Model Inference Using Quick and Approximate Kernels for Exponential Non-Linearities
Sai Kiran Narayanaswami
Gopalakrishnan Srinivasan
Balaraman Ravindran
VLM
62
0
0
30 Nov 2024
From Algorithm to Hardware: A Survey on Efficient and Safe Deployment of Deep Neural Networks
Xue Geng
Zhe Wang
Chunyun Chen
Qing Xu
Kaixin Xu
...
Zhenghua Chen
M. Aly
Jie Lin
Min-man Wu
Xiaoli Li
33
1
0
09 May 2024
NOVA: NoC-based Vector Unit for Mapping Attention Layers on a CNN Accelerator
Mohit Upadhyay
Rohan Juneja
Weng-Fai Wong
L. Peh
17
0
0
07 May 2024
ConSmax: Hardware-Friendly Alternative Softmax with Learnable Parameters
Shiwei Liu
Guanchen Tao
Yifei Zou
Derek Chow
Zichen Fan
Kauna Lei
Bangfei Pan
Dennis Sylvester
Gregory Kielian
Mehdi Saligane
21
7
0
31 Jan 2024
NeuralMatrix: Compute the Entire Neural Networks with Linear Matrix Operations for Efficient Inference
Ruiqi Sun
Siwei Ye
Jie Zhao
Xin He
Yiran Li
An Zou
35
0
0
23 May 2023
Full Stack Optimization of Transformer Inference: a Survey
Sehoon Kim
Coleman Hooper
Thanakul Wattanawong
Minwoo Kang
Ruohan Yan
...
Qijing Huang
Kurt Keutzer
Michael W. Mahoney
Y. Shao
A. Gholami
MQ
28
100
0
27 Feb 2023
FusionFormer: Fusing Operations in Transformer for Efficient Streaming Speech Recognition
Xingcheng Song
Di Wu
Binbin Zhang
Zhiyong Wu
Wenpeng Li
...
Peng Zhang
Zhendong Peng
Fuping Pan
Changbao Zhu
Zhongqin Wu
19
2
0
31 Oct 2022
Approximate Computing and the Efficient Machine Learning Expedition
J. Henkel
Hai Helen Li
A. Raghunathan
M. Tahoori
Swagath Venkataramani
Xiaoxuan Yang
Georgios Zervakis
11
17
0
02 Oct 2022
Squeezeformer: An Efficient Transformer for Automatic Speech Recognition
Sehoon Kim
A. Gholami
Albert Eaton Shaw
Nicholas Lee
K. Mangalam
Jitendra Malik
Michael W. Mahoney
Kurt Keutzer
19
99
0
02 Jun 2022
A Transistor Operations Model for Deep Learning Energy Consumption Scaling Law
Chen Li
Antonios Tsourdos
Weisi Guo
AI4CE
20
1
0
30 May 2022
I-BERT: Integer-only BERT Quantization
Sehoon Kim
A. Gholami
Z. Yao
Michael W. Mahoney
Kurt Keutzer
MQ
93
341
0
05 Jan 2021
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding
Alex Jinpeng Wang
Amanpreet Singh
Julian Michael
Felix Hill
Omer Levy
Samuel R. Bowman
ELM
297
6,956
0
20 Apr 2018
1