Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2203.10705
Cited By
Compression of Generative Pre-trained Language Models via Quantization
21 March 2022
Chaofan Tao
Lu Hou
Wei Zhang
Lifeng Shang
Xin Jiang
Qun Liu
Ping Luo
Ngai Wong
MQ
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Compression of Generative Pre-trained Language Models via Quantization"
50 / 73 papers shown
Title
ResMoE: Space-efficient Compression of Mixture of Experts LLMs via Residual Restoration
Mengting Ai
Tianxin Wei
Yifan Chen
Zhichen Zeng
Ritchie Zhao
G. Varatkar
B. Rouhani
Xianfeng Tang
Hanghang Tong
Jingrui He
MoE
39
1
0
10 Mar 2025
MergeQuant: Accurate 4-bit Static Quantization of Large Language Models by Channel-wise Calibration
Jinguang Wang
J. Wang
Haifeng Sun
Tingting Yang
Zirui Zhuang
Wanyi Ning
Yuexi Yin
Q. Qi
Jianxin Liao
MQ
MoMe
38
0
0
07 Mar 2025
Hands-On Tutorial: Labeling with LLM and Human-in-the-Loop
Ekaterina Artemova
Akim Tsvigun
Dominik Schlechtweg
Natalia Fedorova
Konstantin Chernyshev
Sergei Tilga
Boris Obmoroshev
SyDa
VLM
63
0
0
28 Jan 2025
The Efficiency vs. Accuracy Trade-off: Optimizing RAG-Enhanced LLM Recommender Systems Using Multi-Head Early Exit
Huixue Zhou
Hengrui Gu
Xi Liu
Kaixiong Zhou
Mingfu Liang
...
Wen-Yen Chen
Yiping Han
Bo Long
Rui Zhang
Tianlong Chen
3DV
31
1
0
04 Jan 2025
PIM-AI: A Novel Architecture for High-Efficiency LLM Inference
Cristobal Ortega
Yann Falevoz
Renaud Ayrignac
66
0
0
26 Nov 2024
Compression via Pre-trained Transformers: A Study on Byte-Level Multimodal Data
David Heurtel-Depeiges
Anian Ruoss
Joel Veness
Tim Genewein
12
1
0
07 Oct 2024
INT-FlashAttention: Enabling Flash Attention for INT8 Quantization
Shimao Chen
Zirui Liu
Zhiying Wu
Ce Zheng
Peizhuang Cong
Zihan Jiang
Yuhan Wu
Lei Su
Tong Yang
MQ
VLM
34
3
0
25 Sep 2024
Towards Efficient Large Language Models for Scientific Text: A Review
H. To
Ming Liu
Guangyan Huang
35
1
0
20 Aug 2024
Graph-Structured Speculative Decoding
Zhuocheng Gong
Jiahao Liu
Ziyue Wang
Pengfei Wu
Jingang Wang
Xunliang Cai
Dongyan Zhao
Rui Yan
13
3
0
23 Jul 2024
Scaling Laws with Vocabulary: Larger Models Deserve Larger Vocabularies
Chaofan Tao
Qian Liu
Longxu Dou
Niklas Muennighoff
Zhongwei Wan
Ping Luo
Min-Bin Lin
Ngai Wong
PILM
50
40
0
18 Jul 2024
Minimizing PLM-Based Few-Shot Intent Detectors
Haode Zhang
Xiao-Ming Wu
Albert Y. S. Lam
VLM
22
0
0
13 Jul 2024
OutlierTune: Efficient Channel-Wise Quantization for Large Language Models
Jinguang Wang
Yuexi Yin
Haifeng Sun
Qi Qi
Jingyu Wang
Zirui Zhuang
Tingting Yang
Jianxin Liao
25
2
0
27 Jun 2024
D2O: Dynamic Discriminative Operations for Efficient Long-Context Inference of Large Language Models
Zhongwei Wan
Xinjian Wu
Yu Zhang
Yi Xin
Chaofan Tao
...
Xin Wang
Siqi Luo
Jing Xiong
Mi Zhang
Mi Zhang
16
0
0
18 Jun 2024
Privacy-Aware Randomized Quantization via Linear Programming
Zhongteng Cai
Xueru Zhang
Mohammad Mahdi Khalili
25
1
0
01 Jun 2024
HW-GPT-Bench: Hardware-Aware Architecture Benchmark for Language Models
R. Sukthanker
Arber Zela
B. Staffler
Aaron Klein
Lennart Purucker
Jorg K. H. Franke
Frank Hutter
ELM
25
3
0
16 May 2024
When Quantization Affects Confidence of Large Language Models?
Irina Proskurina
Luc Brun
Guillaume Metzler
Julien Velcin
MQ
16
2
0
01 May 2024
Lightweight Deep Learning for Resource-Constrained Environments: A Survey
Hou-I Liu
Marco Galindo
Hongxia Xie
Lai-Kuan Wong
Hong-Han Shuai
Yung-Hui Li
Wen-Huang Cheng
40
45
0
08 Apr 2024
Rethinking Kullback-Leibler Divergence in Knowledge Distillation for Large Language Models
Taiqiang Wu
Chaofan Tao
Jiahao Wang
Zhe Zhao
Ngai Wong
ALM
33
14
0
03 Apr 2024
Train & Constrain: Phonologically Informed Tongue-Twister Generation from Topics and Paraphrases
Tyler Loakman
Chen Tang
Chenghua Lin
25
4
0
20 Mar 2024
ExeGPT: Constraint-Aware Resource Scheduling for LLM Inference
Hyungjun Oh
Kihong Kim
Jaemin Kim
Sungkyun Kim
Junyeol Lee
Du-Seong Chang
Jiwon Seo
25
27
0
15 Mar 2024
The Impact of Quantization on the Robustness of Transformer-based Text Classifiers
Seyed Parsa Neshaei
Yasaman Boreshban
Gholamreza Ghassem-Sani
Seyed Abolghasem Mirroshandel
MQ
28
0
0
08 Mar 2024
A Survey on Knowledge Distillation of Large Language Models
Xiaohan Xu
Ming Li
Chongyang Tao
Tao Shen
Reynold Cheng
Jinyang Li
Can Xu
Dacheng Tao
Tianyi Zhou
KELM
VLM
32
94
0
20 Feb 2024
PromptKD: Distilling Student-Friendly Knowledge for Generative Language Models via Prompt Tuning
Gyeongman Kim
Doohyuk Jang
Eunho Yang
VLM
16
3
0
20 Feb 2024
Head-wise Shareable Attention for Large Language Models
Zouying Cao
Yifei Yang
Hai Zhao
22
3
0
19 Feb 2024
Model Compression and Efficient Inference for Large Language Models: A Survey
Wenxiao Wang
Wei Chen
Yicong Luo
Yongliu Long
Zhengkai Lin
Liye Zhang
Binbin Lin
Deng Cai
Xiaofei He
MQ
28
30
0
15 Feb 2024
RepQuant: Towards Accurate Post-Training Quantization of Large Transformer Models via Scale Reparameterization
Zhikai Li
Xuewen Liu
Jing Zhang
Qingyi Gu
MQ
21
7
0
08 Feb 2024
Faster and Lighter LLMs: A Survey on Current Challenges and Way Forward
Arnav Chavan
Raghav Magazine
Shubham Kushwaha
M. Debbah
Deepak Gupta
4
18
0
02 Feb 2024
A Comprehensive Survey of Compression Algorithms for Language Models
Seungcheol Park
Jaehyeon Choi
Sojin Lee
U. Kang
MQ
14
10
0
27 Jan 2024
A Survey of Reasoning with Foundation Models
Jiankai Sun
Chuanyang Zheng
E. Xie
Zhengying Liu
Ruihang Chu
...
Xipeng Qiu
Yi-Chen Guo
Hui Xiong
Qun Liu
Zhenguo Li
ReLM
LRM
AI4CE
14
74
0
17 Dec 2023
ZeroQuant(4+2): Redefining LLMs Quantization with a New FP6-Centric Strategy for Diverse Generative Tasks
Xiaoxia Wu
Haojun Xia
Stephen Youn
Zhen Zheng
Shiyang Chen
...
Reza Yazdani Aminabadi
Yuxiong He
Olatunji Ruwase
Leon Song
Zhewei Yao
58
8
0
14 Dec 2023
The Cost of Compression: Investigating the Impact of Compression on Parametric Knowledge in Language Models
Srinath Namburi
Makesh Narsimhan Sreedhar
Srinath Srinivasan
Frederic Sala
MQ
24
8
0
01 Dec 2023
Mini Minds: Exploring Bebeshka and Zlata Baby Models
Irina Proskurina
Guillaume Metzler
Julien Velcin
ALM
11
1
0
06 Nov 2023
ZeroQuant-HERO: Hardware-Enhanced Robust Optimized Post-Training Quantization Framework for W8A8 Transformers
Zhewei Yao
Reza Yazdani Aminabadi
Stephen Youn
Xiaoxia Wu
Elton Zheng
Yuxiong He
MQ
11
1
0
26 Oct 2023
Retrieval-based Knowledge Transfer: An Effective Approach for Extreme Large Language Model Compression
Jiduan Liu
Jiahao Liu
Qifan Wang
Jingang Wang
Xunliang Cai
Dongyan Zhao
R. Wang
Rui Yan
6
4
0
24 Oct 2023
CRaSh: Clustering, Removing, and Sharing Enhance Fine-tuning without Full Large Language Model
Kaiyan Zhang
Ning Ding
Biqing Qi
Xuekai Zhu
Xinwei Long
Bowen Zhou
38
4
0
24 Oct 2023
Zero-Shot Sharpness-Aware Quantization for Pre-trained Language Models
Miaoxi Zhu
Qihuang Zhong
Li Shen
Liang Ding
Juhua Liu
Bo Du
Dacheng Tao
MQ
VLM
21
1
0
20 Oct 2023
Pit One Against Many: Leveraging Attention-head Embeddings for Parameter-efficient Multi-head Attention
Huiyin Xue
Nikolaos Aletras
15
0
0
11 Oct 2023
Language Modeling Is Compression
Grégoire Delétang
Anian Ruoss
Paul-Ambroise Duquenne
Elliot Catt
Tim Genewein
...
Wenliang Kevin Li
Matthew Aitchison
Laurent Orseau
Marcus Hutter
J. Veness
AI4CE
6
128
0
19 Sep 2023
Token-Scaled Logit Distillation for Ternary Weight Generative Language Models
Minsoo Kim
Sihwa Lee
Jangwhan Lee
S. Hong
Duhyeuk Chang
Wonyong Sung
Jungwook Choi
MQ
11
14
0
13 Aug 2023
MSAC: Multiple Speech Attribute Control Method for Reliable Speech Emotion Recognition
Y. Pan
Yuguang Yang
Yuheng Huang
Jixun Yao
Jingjing Yin
Yanni Hu
Heng Lu
Lei Ma
Jianjun Zhao
12
5
0
08 Aug 2023
A Comprehensive Overview of Large Language Models
Humza Naveed
Asad Ullah Khan
Shi Qiu
Muhammad Saqib
Saeed Anwar
Muhammad Usman
Naveed Akhtar
Nick Barnes
Ajmal Saeed Mian
OffRL
30
499
0
12 Jul 2023
FedYolo: Augmenting Federated Learning with Pretrained Transformers
Xuechen Zhang
Mingchen Li
Xiangyu Chang
Jiasi Chen
A. Roy-Chowdhury
A. Suresh
Samet Oymak
FedML
16
7
0
10 Jul 2023
Toward the Cure of Privacy Policy Reading Phobia: Automated Generation of Privacy Nutrition Labels From Privacy Policies
Shidong Pan
Thong Hoang
Dawen Zhang
Zhenchang Xing
Xiwei Xu
Qinghua Lu
Mark Staples
28
10
0
19 Jun 2023
A Comprehensive Review of State-of-The-Art Methods for Java Code Generation from Natural Language Text
Jessica Nayeli López Espejel
Mahaman Sanoussi Yahaya Alassan
El Mehdi Chouham
Walid Dahhane
E. Ettifouri
16
12
0
10 Jun 2023
Binary and Ternary Natural Language Generation
Zechun Liu
Barlas Oğuz
Aasish Pappu
Yangyang Shi
Raghuraman Krishnamoorthi
MQ
28
6
0
02 Jun 2023
PreQuant: A Task-agnostic Quantization Approach for Pre-trained Language Models
Zhuocheng Gong
Jiahao Liu
Qifan Wang
Yang Yang
Jingang Wang
Wei Yu Wu
Yunsen Xian
Dongyan Zhao
Rui Yan
MQ
25
5
0
30 May 2023
CrossGET: Cross-Guided Ensemble of Tokens for Accelerating Vision-Language Transformers
Dachuan Shi
Chaofan Tao
Anyi Rao
Zhendong Yang
Chun Yuan
Jiaqi Wang
VLM
15
22
0
27 May 2023
Task-agnostic Distillation of Encoder-Decoder Language Models
Chen Zhang
Yang Yang
Jingang Wang
Dawei Song
14
3
0
21 May 2023
Weight-Inherited Distillation for Task-Agnostic BERT Compression
Taiqiang Wu
Cheng-An Hou
Shanshan Lao
Jiayi Li
Ngai Wong
Zhe Zhao
Yujiu Yang
52
10
0
16 May 2023
Outlier Suppression+: Accurate quantization of large language models by equivalent and optimal shifting and scaling
Xiuying Wei
Yunchen Zhang
Yuhang Li
Xiangguo Zhang
Ruihao Gong
Jian Ren
Zhengang Li
MQ
8
29
0
18 Apr 2023
1
2
Next