Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2101.01321
Cited By
I-BERT: Integer-only BERT Quantization
5 January 2021
Sehoon Kim
A. Gholami
Z. Yao
Michael W. Mahoney
Kurt Keutzer
MQ
Re-assign community
ArXiv
PDF
HTML
Papers citing
"I-BERT: Integer-only BERT Quantization"
14 / 14 papers shown
Title
NeuroSim V1.5: Improved Software Backbone for Benchmarking Compute-in-Memory Accelerators with Device and Circuit-level Non-idealities
James Read
Ming-Yen Lee
Wei-Hsing Huang
Yuan-Chun Luo
A. Lu
Shimeng Yu
17
0
0
05 May 2025
Low-Bit Integerization of Vision Transformers using Operand Reodering for Efficient Hardware
Ching-Yi Lin
Sahil Shah
MQ
54
0
0
11 Apr 2025
EAGLE-3: Scaling up Inference Acceleration of Large Language Models via Training-Time Test
Yuhui Li
Fangyun Wei
Chao Zhang
Hongyang R. Zhang
100
3
0
03 Mar 2025
CipherPrune: Efficient and Scalable Private Transformer Inference
Yancheng Zhang
J. Xue
Mengxin Zheng
Mimi Xie
Mingzhe Zhang
Lei Jiang
Qian Lou
38
2
0
24 Feb 2025
BEEM: Boosting Performance of Early Exit DNNs using Multi-Exit Classifiers as Experts
Divya J. Bajpai
M. Hanawal
58
0
0
02 Feb 2025
SageAttention: Accurate 8-Bit Attention for Plug-and-play Inference Acceleration
Jintao Zhang
Jia wei
Pengle Zhang
Jun-Jie Zhu
Jun Zhu
Jianfei Chen
VLM
MQ
60
18
0
03 Oct 2024
EAGLE: Speculative Sampling Requires Rethinking Feature Uncertainty
Yuhui Li
Fangyun Wei
Chao Zhang
Hongyang R. Zhang
17
114
0
26 Jan 2024
Too Brittle To Touch: Comparing the Stability of Quantization and Distillation Towards Developing Lightweight Low-Resource MT Models
Harshita Diddee
Sandipan Dandapat
Monojit Choudhury
T. Ganu
Kalika Bali
14
5
0
27 Oct 2022
Efficient Quantized Sparse Matrix Operations on Tensor Cores
Shigang Li
Kazuki Osawa
Torsten Hoefler
66
26
0
14 Sep 2022
BinaryBERT: Pushing the Limit of BERT Quantization
Haoli Bai
Wei Zhang
Lu Hou
Lifeng Shang
Jing Jin
Xin Jiang
Qun Liu
Michael Lyu
Irwin King
MQ
130
183
0
31 Dec 2020
BERT-of-Theseus: Compressing BERT by Progressive Module Replacing
Canwen Xu
Wangchunshu Zhou
Tao Ge
Furu Wei
Ming Zhou
207
196
0
07 Feb 2020
Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism
M. Shoeybi
M. Patwary
Raul Puri
P. LeGresley
Jared Casper
Bryan Catanzaro
MoE
243
1,791
0
17 Sep 2019
Q-BERT: Hessian Based Ultra Low Precision Quantization of BERT
Sheng Shen
Zhen Dong
Jiayu Ye
Linjian Ma
Z. Yao
A. Gholami
Michael W. Mahoney
Kurt Keutzer
MQ
214
505
0
12 Sep 2019
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding
Alex Jinpeng Wang
Amanpreet Singh
Julian Michael
Felix Hill
Omer Levy
Samuel R. Bowman
ELM
294
6,003
0
20 Apr 2018
1