Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2307.05972
Cited By
Self-Distilled Quantization: Achieving High Compression Rates in Transformer-Based Language Models
12 July 2023
James OÑeill
Sourav Dutta
VLM
MQ
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Self-Distilled Quantization: Achieving High Compression Rates in Transformer-Based Language Models"
4 / 4 papers shown
Title
Model Compression and Efficient Inference for Large Language Models: A Survey
Wenxiao Wang
Wei Chen
Yicong Luo
Yongliu Long
Zhengkai Lin
Liye Zhang
Binbin Lin
Deng Cai
Xiaofei He
MQ
36
46
0
15 Feb 2024
I-BERT: Integer-only BERT Quantization
Sehoon Kim
A. Gholami
Z. Yao
Michael W. Mahoney
Kurt Keutzer
MQ
86
336
0
05 Jan 2021
BinaryBERT: Pushing the Limit of BERT Quantization
Haoli Bai
Wei Zhang
Lu Hou
Lifeng Shang
Jing Jin
Xin Jiang
Qun Liu
Michael Lyu
Irwin King
MQ
138
221
0
31 Dec 2020
Incremental Network Quantization: Towards Lossless CNNs with Low-Precision Weights
Aojun Zhou
Anbang Yao
Yiwen Guo
Lin Xu
Yurong Chen
MQ
308
1,047
0
10 Feb 2017
1