ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2211.10438
  4. Cited By
SmoothQuant: Accurate and Efficient Post-Training Quantization for Large
  Language Models

SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Models

18 November 2022
Guangxuan Xiao
Ji Lin
Mickael Seznec
Hao Wu
Julien Demouth
Song Han
    MQ
ArXivPDFHTML

Papers citing "SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Models"

50 / 526 papers shown
Title
Mixture of Experts with Mixture of Precisions for Tuning Quality of
  Service
Mixture of Experts with Mixture of Precisions for Tuning Quality of Service
HamidReza Imani
Abdolah Amirany
Tarek A. El-Ghazawi
MoE
56
6
0
19 Jul 2024
SmartQuant: CXL-based AI Model Store in Support of Runtime Configurable
  Weight Quantization
SmartQuant: CXL-based AI Model Store in Support of Runtime Configurable Weight Quantization
Rui Xie
Asad Ul Haq
Linsen Ma
Krystal Sun
Sanchari Sen
Swagath Venkataramani
Liu Liu
Tong Zhang
MQ
23
1
0
17 Jul 2024
Mamba-PTQ: Outlier Channels in Recurrent Large Language Models
Mamba-PTQ: Outlier Channels in Recurrent Large Language Models
Alessandro Pierro
Steven Abreu
MQ
Mamba
43
6
0
17 Jul 2024
Turbo: Informativity-Driven Acceleration Plug-In for Vision-Language
  Large Models
Turbo: Informativity-Driven Acceleration Plug-In for Vision-Language Large Models
Chen Ju
Haicheng Wang
Haozhe Cheng
Xu Chen
Zhonghua Zhai
Weilin Huang
Jinsong Lan
Shuai Xiao
Bo Zheng
VLM
41
5
0
16 Jul 2024
MINI-LLM: Memory-Efficient Structured Pruning for Large Language Models
MINI-LLM: Memory-Efficient Structured Pruning for Large Language Models
Hongrong Cheng
Miao Zhang
J. Q. Shi
41
2
0
16 Jul 2024
Real-Time Anomaly Detection and Reactive Planning with Large Language
  Models
Real-Time Anomaly Detection and Reactive Planning with Large Language Models
Rohan Sinha
Amine Elhafsi
Christopher Agia
Matthew Foutter
Edward Schmerling
Marco Pavone
OffRL
LRM
35
24
0
11 Jul 2024
EfficientQAT: Efficient Quantization-Aware Training for Large Language
  Models
EfficientQAT: Efficient Quantization-Aware Training for Large Language Models
Mengzhao Chen
Wenqi Shao
Peng Xu
Jiahao Wang
Peng Gao
Kaipeng Zhang
Yu Qiao
Ping Luo
MQ
36
22
0
10 Jul 2024
Scaling Retrieval-Based Language Models with a Trillion-Token Datastore
Scaling Retrieval-Based Language Models with a Trillion-Token Datastore
Rulin Shao
Jacqueline He
Akari Asai
Weijia Shi
Tim Dettmers
Sewon Min
Luke Zettlemoyer
Pang Wei Koh
RALM
27
22
0
09 Jul 2024
Composable Interventions for Language Models
Composable Interventions for Language Models
Arinbjorn Kolbeinsson
Kyle O'Brien
Tianjin Huang
Shanghua Gao
Shiwei Liu
...
Anurag J. Vaidya
Faisal Mahmood
Marinka Zitnik
Tianlong Chen
Thomas Hartvigsen
KELM
MU
82
5
0
09 Jul 2024
Mobile Edge Intelligence for Large Language Models: A Contemporary Survey
Mobile Edge Intelligence for Large Language Models: A Contemporary Survey
Guanqiao Qu
Qiyuan Chen
Wei Wei
Zheng Lin
Xianhao Chen
Kaibin Huang
37
41
0
09 Jul 2024
On the Limitations of Compute Thresholds as a Governance Strategy
On the Limitations of Compute Thresholds as a Governance Strategy
Sara Hooker
45
14
0
08 Jul 2024
Pruning Large Language Models to Intra-module Low-rank Architecture with
  Transitional Activations
Pruning Large Language Models to Intra-module Low-rank Architecture with Transitional Activations
Bowen Shen
Zheng-Shen Lin
Daren Zha
Wei Liu
Jian Luan
Bin Wang
Weiping Wang
52
1
0
08 Jul 2024
The Impact of Quantization and Pruning on Deep Reinforcement Learning
  Models
The Impact of Quantization and Pruning on Deep Reinforcement Learning Models
Heng Lu
Mehdi Alemi
Reza Rawassizadeh
34
1
0
05 Jul 2024
Leveraging Large Language Models for Integrated
  Satellite-Aerial-Terrestrial Networks: Recent Advances and Future Directions
Leveraging Large Language Models for Integrated Satellite-Aerial-Terrestrial Networks: Recent Advances and Future Directions
Shumaila Javaid
R. A. Khalil
Nasir Saeed
Bin He
Mohamed-Slim Alouini
32
9
0
05 Jul 2024
SpikeLLM: Scaling up Spiking Neural Network to Large Language Models via Saliency-based Spiking
SpikeLLM: Scaling up Spiking Neural Network to Large Language Models via Saliency-based Spiking
Xingrun Xing
Boyan Gao
Zheng Zhang
David A. Clifton
Shitao Xiao
LI DU
Guoqi Li
Jiajun Zhang
47
5
0
05 Jul 2024
GPTQT: Quantize Large Language Models Twice to Push the Efficiency
GPTQT: Quantize Large Language Models Twice to Push the Efficiency
Yipin Guo
Yilin Lang
Qinyuan Ren
MQ
19
1
0
03 Jul 2024
Let the Code LLM Edit Itself When You Edit the Code
Let the Code LLM Edit Itself When You Edit the Code
Zhenyu He
Jun Zhang
Shengjie Luo
Jingjing Xu
Z. Zhang
Di He
KELM
31
0
0
03 Jul 2024
VcLLM: Video Codecs are Secretly Tensor Codecs
VcLLM: Video Codecs are Secretly Tensor Codecs
Ceyu Xu
Yongji Wu
Xinyu Yang
Beidi Chen
Matthew Lentz
Danyang Zhuo
Lisa Wu Wills
45
0
0
29 Jun 2024
Teola: Towards End-to-End Optimization of LLM-based Applications
Teola: Towards End-to-End Optimization of LLM-based Applications
Xin Tan
Yimin Jiang
Yitao Yang
Hong-Yu Xu
57
5
0
29 Jun 2024
InfiniGen: Efficient Generative Inference of Large Language Models with
  Dynamic KV Cache Management
InfiniGen: Efficient Generative Inference of Large Language Models with Dynamic KV Cache Management
Wonbeom Lee
Jungi Lee
Junghwan Seo
Jaewoong Sim
RALM
26
75
0
28 Jun 2024
LLMEasyQuant: Scalable Quantization for Parallel and Distributed LLM Inference
LLMEasyQuant: Scalable Quantization for Parallel and Distributed LLM Inference
Dong Liu
Meng Jiang
MQ
31
12
0
28 Jun 2024
OutlierTune: Efficient Channel-Wise Quantization for Large Language
  Models
OutlierTune: Efficient Channel-Wise Quantization for Large Language Models
Jinguang Wang
Yuexi Yin
Haifeng Sun
Qi Qi
Jingyu Wang
Zirui Zhuang
Tingting Yang
Jianxin Liao
33
2
0
27 Jun 2024
MemServe: Context Caching for Disaggregated LLM Serving with Elastic
  Memory Pool
MemServe: Context Caching for Disaggregated LLM Serving with Elastic Memory Pool
Cunchen Hu
Heyang Huang
Junhao Hu
Jiang Xu
Xusheng Chen
...
Chenxi Wang
Sa Wang
Yungang Bao
Ninghui Sun
Yizhou Shan
LLMAG
45
22
0
25 Jun 2024
Layer-Wise Quantization: A Pragmatic and Effective Method for Quantizing
  LLMs Beyond Integer Bit-Levels
Layer-Wise Quantization: A Pragmatic and Effective Method for Quantizing LLMs Beyond Integer Bit-Levels
Razvan-Gabriel Dumitru
Vikas Yadav
Rishabh Maheshwary
Paul-Ioan Clotan
Sathwik Tejaswi Madhusudhan
Mihai Surdeanu
MQ
38
2
0
25 Jun 2024
Q-DiT: Accurate Post-Training Quantization for Diffusion Transformers
Q-DiT: Accurate Post-Training Quantization for Diffusion Transformers
Lei Chen
Yuan Meng
Chen Tang
Xinzhu Ma
Jingyan Jiang
Xin Wang
Zhi Wang
Wenwu Zhu
MQ
23
21
0
25 Jun 2024
T-MAC: CPU Renaissance via Table Lookup for Low-Bit LLM Deployment on Edge
T-MAC: CPU Renaissance via Table Lookup for Low-Bit LLM Deployment on Edge
Jianyu Wei
Shijie Cao
Ting Cao
Lingxiao Ma
Lei Wang
Yanyong Zhang
Mao Yang
MQ
45
11
0
25 Jun 2024
From Decoding to Meta-Generation: Inference-time Algorithms for Large
  Language Models
From Decoding to Meta-Generation: Inference-time Algorithms for Large Language Models
Sean Welleck
Amanda Bertsch
Matthew Finlayson
Hailey Schoelkopf
Alex Xie
Graham Neubig
Ilia Kulikov
Zaid Harchaoui
33
47
0
24 Jun 2024
Lottery Ticket Adaptation: Mitigating Destructive Interference in LLMs
Lottery Ticket Adaptation: Mitigating Destructive Interference in LLMs
Ashwinee Panda
Berivan Isik
Xiangyu Qi
Sanmi Koyejo
Tsachy Weissman
Prateek Mittal
MoMe
45
12
0
24 Jun 2024
Evaluation of Language Models in the Medical Context Under
  Resource-Constrained Settings
Evaluation of Language Models in the Medical Context Under Resource-Constrained Settings
Andrea Posada
Daniel Rueckert
Felix Meissen
Philip Muller
LM&MA
ELM
29
0
0
24 Jun 2024
MoA: Mixture of Sparse Attention for Automatic Large Language Model
  Compression
MoA: Mixture of Sparse Attention for Automatic Large Language Model Compression
Tianyu Fu
Haofeng Huang
Xuefei Ning
Genghan Zhang
Boju Chen
...
Shiyao Li
Shengen Yan
Guohao Dai
Huazhong Yang
Yu Wang
MQ
44
17
0
21 Jun 2024
SDQ: Sparse Decomposed Quantization for LLM Inference
SDQ: Sparse Decomposed Quantization for LLM Inference
Geonhwa Jeong
Po-An Tsai
S. Keckler
Tushar Krishna
MQ
30
3
0
19 Jun 2024
Slice-Level Scheduling for High Throughput and Load Balanced LLM Serving
Slice-Level Scheduling for High Throughput and Load Balanced LLM Serving
Ke Cheng
Wen Hu
Zhi Wang
Hongen Peng
Jianguo Li
Sheng Zhang
45
7
0
19 Jun 2024
VELO: A Vector Database-Assisted Cloud-Edge Collaborative LLM QoS
  Optimization Framework
VELO: A Vector Database-Assisted Cloud-Edge Collaborative LLM QoS Optimization Framework
Zhi Yao
Zhiqing Tang
Jiong Lou
Ping Shen
Weijia Jia
40
7
0
19 Jun 2024
BoA: Attention-aware Post-training Quantization without Backpropagation
BoA: Attention-aware Post-training Quantization without Backpropagation
Junhan Kim
Ho-Young Kim
Eulrang Cho
Chungman Lee
Joonyoung Kim
Yongkweon Jeon
MQ
33
0
0
19 Jun 2024
Bayesian-LoRA: LoRA based Parameter Efficient Fine-Tuning using Optimal
  Quantization levels and Rank Values trough Differentiable Bayesian Gates
Bayesian-LoRA: LoRA based Parameter Efficient Fine-Tuning using Optimal Quantization levels and Rank Values trough Differentiable Bayesian Gates
Cristian Meo
Ksenia Sycheva
Anirudh Goyal
Justin Dauwels
MQ
24
4
0
18 Jun 2024
Endor: Hardware-Friendly Sparse Format for Offloaded LLM Inference
Endor: Hardware-Friendly Sparse Format for Offloaded LLM Inference
Donghyeon Joo
Ramyad Hadidi
S. Feizi
Bahar Asgari
MQ
24
0
0
17 Jun 2024
SampleAttention: Near-Lossless Acceleration of Long Context LLM
  Inference with Adaptive Structured Sparse Attention
SampleAttention: Near-Lossless Acceleration of Long Context LLM Inference with Adaptive Structured Sparse Attention
Qianchao Zhu
Jiangfei Duan
Chang Chen
Siran Liu
Xiuhong Li
...
Huanqi Cao
Xiao Chuanfu
Xingcheng Zhang
Dahua Lin
Chao Yang
30
15
0
17 Jun 2024
ExCP: Extreme LLM Checkpoint Compression via Weight-Momentum Joint
  Shrinking
ExCP: Extreme LLM Checkpoint Compression via Weight-Momentum Joint Shrinking
Wenshuo Li
Xinghao Chen
Han Shu
Yehui Tang
Yunhe Wang
MQ
31
2
0
17 Jun 2024
An Analysis on Quantizing Diffusion Transformers
An Analysis on Quantizing Diffusion Transformers
Yuewei Yang
Jialiang Wang
Xiaoliang Dai
Peizhao Zhang
Hongbo Zhang
MQ
29
1
0
16 Jun 2024
Outlier Reduction with Gated Attention for Improved Post-training
  Quantization in Large Sequence-to-sequence Speech Foundation Models
Outlier Reduction with Gated Attention for Improved Post-training Quantization in Large Sequence-to-sequence Speech Foundation Models
Dominik Wagner
Ilja Baumann
K. Riedhammer
Tobias Bocklet
MQ
30
1
0
16 Jun 2024
Tender: Accelerating Large Language Models via Tensor Decomposition and
  Runtime Requantization
Tender: Accelerating Large Language Models via Tensor Decomposition and Runtime Requantization
Jungi Lee
Wonbeom Lee
Jaewoong Sim
MQ
24
14
0
16 Jun 2024
Evaluating the Generalization Ability of Quantized LLMs: Benchmark,
  Analysis, and Toolbox
Evaluating the Generalization Ability of Quantized LLMs: Benchmark, Analysis, and Toolbox
Yijun Liu
Yuan Meng
Fang Wu
Shenhao Peng
Hang Yao
Chaoyu Guan
Chen Tang
Xinzhu Ma
Zhi Wang
Wenwu Zhu
MQ
48
7
0
15 Jun 2024
ME-Switch: A Memory-Efficient Expert Switching Framework for Large
  Language Models
ME-Switch: A Memory-Efficient Expert Switching Framework for Large Language Models
Jing Liu
Ruihao Gong
Mingyang Zhang
Yefei He
Jianfei Cai
Bohan Zhuang
MoE
37
0
0
13 Jun 2024
ALPS: Improved Optimization for Highly Sparse One-Shot Pruning for Large
  Language Models
ALPS: Improved Optimization for Highly Sparse One-Shot Pruning for Large Language Models
Xiang Meng
Kayhan Behdin
Haoyue Wang
Rahul Mazumder
29
3
0
12 Jun 2024
OPTune: Efficient Online Preference Tuning
OPTune: Efficient Online Preference Tuning
Lichang Chen
Jiuhai Chen
Chenxi Liu
John Kirchenbauer
Davit Soselia
Chen Zhu
Tom Goldstein
Tianyi Zhou
Heng Huang
34
4
0
11 Jun 2024
When Linear Attention Meets Autoregressive Decoding: Towards More
  Effective and Efficient Linearized Large Language Models
When Linear Attention Meets Autoregressive Decoding: Towards More Effective and Efficient Linearized Large Language Models
Haoran You
Yichao Fu
Zheng Wang
Amir Yazdanbakhsh
Yingyan Celine Lin
31
1
0
11 Jun 2024
TernaryLLM: Ternarized Large Language Model
TernaryLLM: Ternarized Large Language Model
Tianqi Chen
Zhe Li
Weixiang Xu
Zeyu Zhu
Dong Li
Lu Tian
E. Barsoum
Peisong Wang
Jian Cheng
28
7
0
11 Jun 2024
MoreauPruner: Robust Pruning of Large Language Models against Weight
  Perturbations
MoreauPruner: Robust Pruning of Large Language Models against Weight Perturbations
Zixiao Wang
Jingwei Zhang
Wenqian Zhao
Farzan Farnia
Bei Yu
AAML
30
3
0
11 Jun 2024
Low-Rank Quantization-Aware Training for LLMs
Low-Rank Quantization-Aware Training for LLMs
Yelysei Bondarenko
Riccardo Del Chiaro
Markus Nagel
MQ
33
9
0
10 Jun 2024
ShiftAddLLM: Accelerating Pretrained LLMs via Post-Training
  Multiplication-Less Reparameterization
ShiftAddLLM: Accelerating Pretrained LLMs via Post-Training Multiplication-Less Reparameterization
Haoran You
Yipin Guo
Yichao Fu
Wei Zhou
Huihong Shi
Xiaofan Zhang
Souvik Kundu
Amir Yazdanbakhsh
Y. Lin
KELM
44
7
0
10 Jun 2024
Previous
123456...91011
Next