Efficient Softmax Approximation for Deep Neural Networks with Attention Mechanism

21 November 2021

Papers citing "Efficient Softmax Approximation for Deep Neural Networks with Attention Mechanism"

7 / 7 papers shown

Title
EXAQ: Exponent Aware Quantization For LLMs Acceleration Moran Shkolnik Maxim Fishman Brian Chmiel Hilla Ben-Yaacov Ron Banner Kfir Y. Levy MQ 16 0 0 04 Oct 2024
KWT-Tiny: RISC-V Accelerated, Embedded Keyword Spotting Transformer Aness Al-Qawlaq Ajay Kumar Deepu John 24 0 0 22 Jul 2024
Integer-only Quantized Transformers for Embedded FPGA-based Time-series Forecasting in AIoT Tianheng Ling Chao Qian Gregor Schiele AI4TS MQ 19 1 0 06 Jul 2024
SimA: Simple Softmax-free Attention for Vision Transformers Soroush Abbasi Koohpayegani Hamed Pirsiavash 16 25 0 17 Jun 2022
Q-BERT: Hessian Based Ultra Low Precision Quantization of BERT Sheng Shen Zhen Dong Jiayu Ye Linjian Ma Z. Yao A. Gholami Michael W. Mahoney Kurt Keutzer MQ 225 575 0 12 Sep 2019
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding Alex Jinpeng Wang Amanpreet Singh Julian Michael Felix Hill Omer Levy Samuel R. Bowman ELM 297 6,950 0 20 Apr 2018
OpenNMT: Open-Source Toolkit for Neural Machine Translation Guillaume Klein Yoon Kim Yuntian Deng Jean Senellart Alexander M. Rush 254 1,896 0 10 Jan 2017