ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2310.08041
  4. Cited By
QLLM: Accurate and Efficient Low-Bitwidth Quantization for Large
  Language Models
v1v2v3 (latest)

QLLM: Accurate and Efficient Low-Bitwidth Quantization for Large Language Models

International Conference on Learning Representations (ICLR), 2023
12 October 2023
Jing Liu
Yazhe Niu
Xiuying Wei
Zhiwei Dong
Jianfei Cai
Bohan Zhuang
    MQ
ArXiv (abs)PDFHTMLHuggingFace (1 upvotes)Github (5046★)

Papers citing "QLLM: Accurate and Efficient Low-Bitwidth Quantization for Large Language Models"

39 / 39 papers shown
Layer-Wise High-Impact Parameter Ratio Optimization in Post-Training Quantization for Large Language Models
Layer-Wise High-Impact Parameter Ratio Optimization in Post-Training Quantization for Large Language Models
Cuong Pham
Hoang Anh Dung
Cuong C. Nguyen
Trung Le
G. Carneiro
Thanh-Toan Do
MQ
179
0
0
21 Nov 2025
Improving the Straight-Through Estimator with Zeroth-Order Information
Improving the Straight-Through Estimator with Zeroth-Order Information
Ningfeng Yang
Tor M. Aamodt
FedML
321
1
0
27 Oct 2025
Tequila: Trapping-free Ternary Quantization for Large Language Models
Tequila: Trapping-free Ternary Quantization for Large Language Models
Hong Huang
Decheng Wu
Rui Cen
Guanghua Yu
Z. Li
Kai Liu
Jianchen Zhu
Peng Chen
Xue Liu
Dapeng Wu
MQ
309
4
0
28 Sep 2025
Integrating Pruning with Quantization for Efficient Deep Neural Networks Compression
Integrating Pruning with Quantization for Efficient Deep Neural Networks Compression
Sara Makenali
Babak Rokh
A. Azarpeyvand
MQ
172
1
0
04 Sep 2025
MixA-Q: Revisiting Activation Sparsity for Vision Transformers from a Mixed-Precision Quantization Perspective
MixA-Q: Revisiting Activation Sparsity for Vision Transformers from a Mixed-Precision Quantization Perspective
Weitian Wang
Rai Shubham
Cecilia De La Parra
Akash Kumar
MQ
214
2
0
25 Jul 2025
GPTAQ: Efficient Finetuning-Free Quantization for Asymmetric Calibration
GPTAQ: Efficient Finetuning-Free Quantization for Asymmetric Calibration
Yuhang Li
Ruokai Yin
Donghyun Lee
Shiting Xiao
Priyadarshini Panda
MQ
494
30
0
03 Apr 2025
MergeQuant: Accurate 4-bit Static Quantization of Large Language Models by Channel-wise Calibration
MergeQuant: Accurate 4-bit Static Quantization of Large Language Models by Channel-wise Calibration
Jinguang Wang
Jiangming Wang
Haifeng Sun
Tingting Yang
Zirui Zhuang
Wanyi Ning
Yuexi Yin
Q. Qi
Jianxin Liao
MQMoMe
238
3
0
07 Mar 2025
SpinQuant: LLM quantization with learned rotations
SpinQuant: LLM quantization with learned rotationsInternational Conference on Learning Representations (ICLR), 2024
Zechun Liu
Changsheng Zhao
Igor Fedorov
Bilge Soran
Dhruv Choudhary
Raghuraman Krishnamoorthi
Vikas Chandra
Yuandong Tian
Tijmen Blankevoort
MQ
698
309
0
21 Feb 2025
PrefixQuant: Eliminating Outliers by Prefixed Tokens for Large Language Models Quantization
PrefixQuant: Eliminating Outliers by Prefixed Tokens for Large Language Models Quantization
Mengzhao Chen
Lu Dong
Jiahao Wang
Yi Bin
Wenqi Shao
Ping Luo
MQ
367
2
0
28 Jan 2025
Deploying Foundation Model Powered Agent Services: A Survey
Deploying Foundation Model Powered Agent Services: A Survey
Wenchao Xu
Jinyu Chen
Peirong Zheng
Xiaoquan Yi
Tianyi Tian
...
Quan Wan
Yining Qi
Yunfeng Fan
Qinliang Su
Xuemin Shen
AI4CE
554
7
0
18 Dec 2024
Low-Bit Quantization Favors Undertrained LLMs: Scaling Laws for
  Quantized LLMs with 100T Training Tokens
Low-Bit Quantization Favors Undertrained LLMs: Scaling Laws for Quantized LLMs with 100T Training Tokens
Xu Ouyang
Tao Ge
Thomas Hartvigsen
Zhisong Zhang
Haitao Mi
Dong Yu
MQ
430
13
0
26 Nov 2024
AMXFP4: Taming Activation Outliers with Asymmetric Microscaling Floating-Point for 4-bit LLM Inference
AMXFP4: Taming Activation Outliers with Asymmetric Microscaling Floating-Point for 4-bit LLM InferenceAnnual Meeting of the Association for Computational Linguistics (ACL), 2024
Janghwan Lee
Jiwoong Park
Jinseok Kim
Yongjik Kim
Jungju Oh
Jinwook Oh
Jungwook Choi
429
15
0
15 Nov 2024
TesseraQ: Ultra Low-Bit LLM Post-Training Quantization with Block
  Reconstruction
TesseraQ: Ultra Low-Bit LLM Post-Training Quantization with Block Reconstruction
Yuhang Li
Priyadarshini Panda
MQ
272
4
0
24 Oct 2024
Scaling Laws For Mixed Quantization
Scaling Laws For Mixed Quantization
Zeyu Cao
Boyang Gu
Cheng Zhang
Pedro Gimenes
Jianqiao Lu
Jianyi Cheng
Xitong Gao
Yiren Zhao
MQ
414
1
0
09 Oct 2024
QERA: an Analytical Framework for Quantization Error Reconstruction
QERA: an Analytical Framework for Quantization Error Reconstruction
Cheng Zhang
Jeffrey T. H. Wong
Can Xiao
George A. Constantinides
Yiren Zhao
MQ
219
13
0
08 Oct 2024
LUT Tensor Core: A Software-Hardware Co-Design for LUT-Based Low-Bit LLM Inference
LUT Tensor Core: A Software-Hardware Co-Design for LUT-Based Low-Bit LLM InferenceInternational Symposium on Computer Architecture (ISCA), 2024
Zhiwen Mo
Lei Wang
Jianyu Wei
Zhichen Zeng
Shijie Cao
...
Naifeng Jing
Ting Cao
Jilong Xue
Fan Yang
Mao Yang
430
4
0
12 Aug 2024
EfficientQAT: Efficient Quantization-Aware Training for Large Language Models
EfficientQAT: Efficient Quantization-Aware Training for Large Language Models
Mengzhao Chen
Wenqi Shao
Peng Xu
Jiahao Wang
Shiyang Feng
Kaipeng Zhang
Ping Luo
MQ
661
109
0
10 Jul 2024
Mobile Edge Intelligence for Large Language Models: A Contemporary Survey
Mobile Edge Intelligence for Large Language Models: A Contemporary Survey
Guanqiao Qu
Qiyuan Chen
Wei Wei
Zheng Lin
Xianhao Chen
Kaibin Huang
625
196
0
09 Jul 2024
SpikeLLM: Scaling up Spiking Neural Network to Large Language Models via Saliency-based Spiking
SpikeLLM: Scaling up Spiking Neural Network to Large Language Models via Saliency-based Spiking
Xingrun Xing
Boyan Gao
Zheng Zhang
David A. Clifton
Shitao Xiao
Li Du
Guoqi Li
Jiajun Zhang
530
23
0
05 Jul 2024
OutlierTune: Efficient Channel-Wise Quantization for Large Language
  Models
OutlierTune: Efficient Channel-Wise Quantization for Large Language Models
Jinguang Wang
Yuexi Yin
Haifeng Sun
Qi Qi
Jingyu Wang
Zirui Zhuang
Tingting Yang
Jianxin Liao
221
4
0
27 Jun 2024
ME-Switch: A Memory-Efficient Expert Switching Framework for Large
  Language Models
ME-Switch: A Memory-Efficient Expert Switching Framework for Large Language Models
Jing Liu
Yazhe Niu
Mingyang Zhang
Yefei He
Jianfei Cai
Bohan Zhuang
MoE
233
3
0
13 Jun 2024
Low-Rank Quantization-Aware Training for LLMs
Low-Rank Quantization-Aware Training for LLMs
Yelysei Bondarenko
Riccardo Del Chiaro
Markus Nagel
MQ
449
46
0
10 Jun 2024
PTQ4DiT: Post-training Quantization for Diffusion Transformers
PTQ4DiT: Post-training Quantization for Diffusion Transformers
Junyi Wu
Haoxuan Wang
Yuzhang Shang
Mubarak Shah
Yan Yan
MQ
392
56
0
25 May 2024
MiniCache: KV Cache Compression in Depth Dimension for Large Language
  Models
MiniCache: KV Cache Compression in Depth Dimension for Large Language ModelsNeural Information Processing Systems (NeurIPS), 2024
Akide Liu
Jing Liu
Zizheng Pan
Yefei He
Gholamreza Haffari
Bohan Zhuang
MQ
335
90
0
23 May 2024
ZipCache: Accurate and Efficient KV Cache Quantization with Salient
  Token Identification
ZipCache: Accurate and Efficient KV Cache Quantization with Salient Token IdentificationNeural Information Processing Systems (NeurIPS), 2024
Yefei He
Luoming Zhang
Weijia Wu
Jing Liu
Hong Zhou
Bohan Zhuang
MQ
380
60
0
23 May 2024
ReALLM: A general framework for LLM compression and fine-tuning
ReALLM: A general framework for LLM compression and fine-tuning
Louis Leconte
Lisa Bedin
Van Minh Nguyen
Eric Moulines
MQ
360
5
0
21 May 2024
OpenBA-V2: Reaching 77.3% High Compression Ratio with Fast Multi-Stage
  Pruning
OpenBA-V2: Reaching 77.3% High Compression Ratio with Fast Multi-Stage Pruning
Dan Qiao
Yi Su
Pinzheng Wang
Jing Ye
Wen Xie
...
Wenliang Chen
Guohong Fu
Guodong Zhou
Qiaoming Zhu
Min Zhang
MQ
282
1
0
09 May 2024
Learning from Students: Applying t-Distributions to Explore Accurate and
  Efficient Formats for LLMs
Learning from Students: Applying t-Distributions to Explore Accurate and Efficient Formats for LLMsInternational Conference on Machine Learning (ICML), 2024
Jordan Dotzel
Yuzong Chen
Bahaa Kotb
Sushma Prasad
Gang Wu
Sheng Li
Mohamed S. Abdelfattah
Zhiru Zhang
426
20
0
06 May 2024
PatentGPT: A Large Language Model for Intellectual Property
PatentGPT: A Large Language Model for Intellectual Property
Zilong Bai
Ruiji Zhang
Linqing Chen
Qijun Cai
Yuan Zhong
...
Fu Bian
Xiaolong Gu
Lisha Zhang
Wentao Wu
Changyang Tu
525
9
0
28 Apr 2024
How to Parameterize Asymmetric Quantization Ranges for
  Quantization-Aware Training
How to Parameterize Asymmetric Quantization Ranges for Quantization-Aware Training
Jaeseong You
Minseop Park
Kyunggeun Lee
Seokjun An
Chirag I. Patel
Markus Nagel
MQ
232
3
0
25 Apr 2024
A Survey on Efficient Inference for Large Language Models
A Survey on Efficient Inference for Large Language Models
Zixuan Zhou
Xuefei Ning
Ke Hong
Tianyu Fu
Jiaming Xu
...
Shengen Yan
Guohao Dai
Xiao-Ping Zhang
Yuhan Dong
Yu Wang
483
205
0
22 Apr 2024
MobileLLM: Optimizing Sub-billion Parameter Language Models for
  On-Device Use Cases
MobileLLM: Optimizing Sub-billion Parameter Language Models for On-Device Use Cases
Zechun Liu
Changsheng Zhao
Forrest N. Iandola
Chen Lai
Yuandong Tian
...
Ernie Chang
Yangyang Shi
Raghuraman Krishnamoorthi
Liangzhen Lai
Vikas Chandra
ALM
392
212
0
22 Feb 2024
Model Compression and Efficient Inference for Large Language Models: A
  Survey
Model Compression and Efficient Inference for Large Language Models: A Survey
Wenxiao Wang
Wei Chen
Yicong Luo
Yongliu Long
Zhengkai Lin
Liye Zhang
Binbin Lin
Deng Cai
Xiaofei He
MQ
365
93
0
15 Feb 2024
A Survey on Transformer Compression
A Survey on Transformer Compression
Yehui Tang
Yunhe Wang
Jianyuan Guo
Zhijun Tu
Kai Han
Hailin Hu
Dacheng Tao
569
74
0
05 Feb 2024
LQER: Low-Rank Quantization Error Reconstruction for LLMs
LQER: Low-Rank Quantization Error Reconstruction for LLMs
Cheng Zhang
Jianyi Cheng
George A. Constantinides
Yiren Zhao
MQ
466
30
0
04 Feb 2024
Scaling Down to Scale Up: A Cost-Benefit Analysis of Replacing OpenAI's
  LLM with Open Source SLMs in Production
Scaling Down to Scale Up: A Cost-Benefit Analysis of Replacing OpenAI's LLM with Open Source SLMs in Production
Chandra Irugalbandara
Ashish Mahendra
Roland Daynauth
T. Arachchige
Kugesan Sivasothynathan
Krisztian Flautner
Lingjia Tang
Yiping Kang
Jason Mars
ELM
398
56
0
20 Dec 2023
CBQ: Cross-Block Quantization for Large Language Models
CBQ: Cross-Block Quantization for Large Language ModelsInternational Conference on Learning Representations (ICLR), 2023
Xin Ding
Xiaoyu Liu
Zhijun Tu
Yun-feng Zhang
Wei Li
...
Hanting Chen
Yehui Tang
Zhiwei Xiong
Baoqun Yin
Yunhe Wang
MQ
888
35
0
13 Dec 2023
SiDA-MoE: Sparsity-Inspired Data-Aware Serving for Efficient and
  Scalable Large Mixture-of-Experts Models
SiDA-MoE: Sparsity-Inspired Data-Aware Serving for Efficient and Scalable Large Mixture-of-Experts ModelsConference on Machine Learning and Systems (MLSys), 2023
Zhixu Du
Shiyu Li
Yuhao Wu
Xiangyu Jiang
Jingwei Sun
Qilin Zheng
Yongkai Wu
Ang Li
Hai Helen Li
Yiran Chen
MoE
455
39
0
29 Oct 2023
OmniQuant: Omnidirectionally Calibrated Quantization for Large Language
  Models
OmniQuant: Omnidirectionally Calibrated Quantization for Large Language ModelsInternational Conference on Learning Representations (ICLR), 2023
Wenqi Shao
Mengzhao Chen
Zhaoyang Zhang
Peng Xu
Lirui Zhao
Zhiqiang Li
Kaipeng Zhang
Shiyang Feng
Yu Qiao
Ping Luo
MQ
596
374
0
25 Aug 2023
1
Page 1 of 1