Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2309.05516
Cited By
Optimize Weight Rounding via Signed Gradient Descent for the Quantization of LLMs
11 September 2023
Wenhua Cheng
Weiwei Zhang
Haihao Shen
Yiyang Cai
Xin He
Kaokao Lv
Yi. Liu
MQ
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Optimize Weight Rounding via Signed Gradient Descent for the Quantization of LLMs"
23 / 23 papers shown
Title
Oaken: Fast and Efficient LLM Serving with Online-Offline Hybrid KV Cache Quantization
Minsu Kim
Seongmin Hong
RyeoWook Ko
S. Choi
Hunjong Lee
Junsoo Kim
J. Kim
Jongse Park
57
0
0
24 Mar 2025
LoXR: Performance Evaluation of Locally Executing LLMs on XR Devices
Dawar Khan
Xinyu Liu
Omar Mena
Donggang Jia
Alexandre Kouyoumdjian
I. Viola
41
0
0
13 Feb 2025
Optimizing Large Language Model Training Using FP4 Quantization
Ruizhe Wang
Yeyun Gong
Xiao Liu
Guoshuai Zhao
Ziyue Yang
Baining Guo
Zhengjun Zha
Peng Cheng
MQ
67
4
0
28 Jan 2025
A dynamic parallel method for performance optimization on hybrid CPUs
Luo Yu
Liu Yucheng
Shen Haihao
51
0
0
29 Nov 2024
Interactions Across Blocks in Post-Training Quantization of Large Language Models
Khasmamad Shabanovi
Lukas Wiest
Vladimir Golkov
Daniel Cremers
Thomas Pfeil
MQ
21
1
0
06 Nov 2024
TesseraQ: Ultra Low-Bit LLM Post-Training Quantization with Block Reconstruction
Yuhang Li
Priyadarshini Panda
MQ
26
1
0
24 Oct 2024
EfficientQAT: Efficient Quantization-Aware Training for Large Language Models
Mengzhao Chen
Wenqi Shao
Peng Xu
Jiahao Wang
Peng Gao
Kaipeng Zhang
Yu Qiao
Ping Luo
MQ
36
22
0
10 Jul 2024
Mobile Edge Intelligence for Large Language Models: A Contemporary Survey
Guanqiao Qu
Qiyuan Chen
Wei Wei
Zheng Lin
Xianhao Chen
Kaibin Huang
35
41
0
09 Jul 2024
Perturbation-Restrained Sequential Model Editing
Junjie Ma
Hong Wang
Haoyang Xu
Zhen-Hua Ling
Jia-Chen Gu
KELM
53
8
0
27 May 2024
Multilingual Brain Surgeon: Large Language Models Can be Compressed Leaving No Language Behind
Hongchuan Zeng
Hongshen Xu
Lu Chen
Kai Yu
47
5
0
06 Apr 2024
AffineQuant: Affine Transformation Quantization for Large Language Models
Yuexiao Ma
Huixia Li
Xiawu Zheng
Feng Ling
Xuefeng Xiao
Rui Wang
Shilei Wen
Fei Chao
Rongrong Ji
MQ
38
17
0
19 Mar 2024
Gemma: Open Models Based on Gemini Research and Technology
Gemma Team
Gemma Team Thomas Mesnard
Cassidy Hardin
Robert Dadashi
Surya Bhupatiraju
...
Armand Joulin
Noah Fiedel
Evan Senter
Alek Andreev
Kathleen Kenealy
VLM
LLMAG
129
423
0
13 Mar 2024
Model Compression and Efficient Inference for Large Language Models: A Survey
Wenxiao Wang
Wei Chen
Yicong Luo
Yongliu Long
Zhengkai Lin
Liye Zhang
Binbin Lin
Deng Cai
Xiaofei He
MQ
36
46
0
15 Feb 2024
QuIP#: Even Better LLM Quantization with Hadamard Incoherence and Lattice Codebooks
Albert Tseng
Jerry Chee
Qingyao Sun
Volodymyr Kuleshov
Christopher De Sa
MQ
126
92
0
06 Feb 2024
ReLU
2
^2
2
Wins: Discovering Efficient Activation Functions for Sparse LLMs
Zhengyan Zhang
Yixin Song
Guanghui Yu
Xu Han
Yankai Lin
Chaojun Xiao
Chenyang Song
Zhiyuan Liu
Zeyu Mi
Maosong Sun
15
31
0
06 Feb 2024
A Survey on Transformer Compression
Yehui Tang
Yunhe Wang
Jianyuan Guo
Zhijun Tu
Kai Han
Hailin Hu
Dacheng Tao
29
27
0
05 Feb 2024
Effective Quantization for Diffusion Models on CPUs
Hanwen Chang
Haihao Shen
Yiyang Cai
Xinyu. Ye
Zhenzhong Xu
Wenhua Cheng
Kaokao Lv
Weiwei Zhang
Yintong Lu
Heng Guo
MQ
22
7
0
02 Nov 2023
Efficient LLM Inference on CPUs
Haihao Shen
Hanwen Chang
Bo Dong
Yu Luo
Hengyu Meng
MQ
15
17
0
01 Nov 2023
SiDA-MoE: Sparsity-Inspired Data-Aware Serving for Efficient and Scalable Large Mixture-of-Experts Models
Zhixu Du
Shiyu Li
Yuhao Wu
Xiangyu Jiang
Jingwei Sun
Qilin Zheng
Yongkai Wu
Ang Li
Hai Helen Li
Yiran Chen
MoE
18
12
0
29 Oct 2023
QLLM: Accurate and Efficient Low-Bitwidth Quantization for Large Language Models
Jing Liu
Ruihao Gong
Xiuying Wei
Zhiwei Dong
Jianfei Cai
Bohan Zhuang
MQ
20
49
0
12 Oct 2023
A Survey on Model Compression for Large Language Models
Xunyu Zhu
Jian Li
Yong Liu
Can Ma
Weiping Wang
24
189
0
15 Aug 2023
ZeroQuant-V2: Exploring Post-training Quantization in LLMs from Comprehensive Study to Low Rank Compensation
Z. Yao
Xiaoxia Wu
Cheng-rong Li
Stephen Youn
Yuxiong He
MQ
63
57
0
15 Mar 2023
The Pile: An 800GB Dataset of Diverse Text for Language Modeling
Leo Gao
Stella Biderman
Sid Black
Laurence Golding
Travis Hoppe
...
Horace He
Anish Thite
Noa Nabeshima
Shawn Presser
Connor Leahy
AIMat
245
1,986
0
31 Dec 2020
1