ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2211.10438
  4. Cited By
SmoothQuant: Accurate and Efficient Post-Training Quantization for Large
  Language Models

SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Models

18 November 2022
Guangxuan Xiao
Ji Lin
Mickael Seznec
Hao Wu
Julien Demouth
Song Han
    MQ
ArXivPDFHTML

Papers citing "SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Models"

50 / 526 papers shown
Title
Deploying Foundation Model Powered Agent Services: A Survey
Deploying Foundation Model Powered Agent Services: A Survey
Wenchao Xu
Jinyu Chen
Peirong Zheng
Xiaoquan Yi
Tianyi Tian
...
Quan Wan
Haozhao Wang
Yunfeng Fan
Qinliang Su
Xuemin Shen
AI4CE
112
1
0
18 Dec 2024
Accelerating Retrieval-Augmented Generation
Accelerating Retrieval-Augmented Generation
Derrick Quinn
Mohammad Nouri
Neel Patel
John Salihu
Alireza Salemi
Sukhan Lee
Hamed Zamani
Mohammad Alian
RALM
3DV
81
2
0
14 Dec 2024
SKIM: Any-bit Quantization Pushing The Limits of Post-Training
  Quantization
SKIM: Any-bit Quantization Pushing The Limits of Post-Training Quantization
Runsheng Bai
Qiang Liu
B. Liu
MQ
59
1
0
05 Dec 2024
PrefixKV: Adaptive Prefix KV Cache is What Vision Instruction-Following
  Models Need for Efficient Generation
PrefixKV: Adaptive Prefix KV Cache is What Vision Instruction-Following Models Need for Efficient Generation
Ao Wang
Hui Chen
Jianchao Tan
K. Zhang
Xunliang Cai
Zijia Lin
J. Han
Guiguang Ding
VLM
77
3
0
04 Dec 2024
Multi-Bin Batching for Increasing LLM Inference Throughput
Multi-Bin Batching for Increasing LLM Inference Throughput
Ozgur Guldogan
Jackson Kunde
Kangwook Lee
Ramtin Pedarsani
LRM
62
2
0
03 Dec 2024
MiniKV: Pushing the Limits of LLM Inference via 2-Bit
  Layer-Discriminative KV Cache
MiniKV: Pushing the Limits of LLM Inference via 2-Bit Layer-Discriminative KV Cache
Akshat Sharma
Hangliang Ding
Jianping Li
Neel Dani
Minjia Zhang
74
1
0
27 Nov 2024
SoftmAP: Software-Hardware Co-design for Integer-Only Softmax on
  Associative Processors
SoftmAP: Software-Hardware Co-design for Integer-Only Softmax on Associative Processors
M. Rakka
J. Li
Guohao Dai
A. Eltawil
M. Fouda
Fadi J. Kurdahi
60
1
0
26 Nov 2024
PIM-AI: A Novel Architecture for High-Efficiency LLM Inference
PIM-AI: A Novel Architecture for High-Efficiency LLM Inference
Cristobal Ortega
Yann Falevoz
Renaud Ayrignac
76
1
0
26 Nov 2024
Anda: Unlocking Efficient LLM Inference with a Variable-Length Grouped
  Activation Data Format
Anda: Unlocking Efficient LLM Inference with a Variable-Length Grouped Activation Data Format
Chao Fang
Man Shi
Robin Geens
Arne Symons
Zhongfeng Wang
Marian Verhelst
69
0
0
24 Nov 2024
Nimbus: Secure and Efficient Two-Party Inference for Transformers
Nimbus: Secure and Efficient Two-Party Inference for Transformers
Zhengyi Li
Kang Yang
Jin Tan
Wen-jie Lu
Haoqi Wu
...
Yu Yu
Derun Zhao
Yancheng Zheng
M. Guo
Jingwen Leng
62
0
0
24 Nov 2024
UnifiedCrawl: Aggregated Common Crawl for Affordable Adaptation of LLMs
  on Low-Resource Languages
UnifiedCrawl: Aggregated Common Crawl for Affordable Adaptation of LLMs on Low-Resource Languages
Bethel Melesse Tessema
Akhil Kedia
Tae-Sun Chung
67
0
0
21 Nov 2024
FuseGPT: Learnable Layers Fusion of Generative Pre-trained Transformers
FuseGPT: Learnable Layers Fusion of Generative Pre-trained Transformers
Zehua Pei
Hui-Ling Zhen
Xianzhi Yu
Sinno Jialin Pan
M. Yuan
Bei Yu
AI4CE
84
0
0
21 Nov 2024
BitMoD: Bit-serial Mixture-of-Datatype LLM Acceleration
Yuzong Chen
Ahmed F. AbouElhamayed
Xilai Dai
Yang Wang
Marta Andronic
G. Constantinides
Mohamed S. Abdelfattah
MQ
100
0
0
18 Nov 2024
Towards Accurate and Efficient Sub-8-Bit Integer Training
Wenjin Guo
Donglai Liu
Weiying Xie
Yunsong Li
Xuefei Ning
Zihan Meng
Shulin Zeng
Jie Lei
Zhenman Fang
Yu Wang
MQ
34
1
0
17 Nov 2024
Generalist Virtual Agents: A Survey on Autonomous Agents Across Digital Platforms
Minghe Gao
Wendong Bu
Bingchen Miao
Yang Wu
Yunfei Li
Juncheng Billy Li
Siliang Tang
Qi Wu
Yueting Zhuang
Meng Wang
LM&Ro
33
3
0
17 Nov 2024
SageAttention2: Efficient Attention with Thorough Outlier Smoothing and Per-thread INT4 Quantization
Jintao Zhang
Haofeng Huang
Pengle Zhang
Jia wei
Jun-Jie Zhu
Jianfei Chen
VLM
MQ
59
2
0
17 Nov 2024
SAM Decoding: Speculative Decoding via Suffix Automaton
SAM Decoding: Speculative Decoding via Suffix Automaton
Yuxuan Hu
Ke Wang
Jing Zhang
Fanjin Zhang
C. Li
H. Chen
Jing Zhang
42
1
0
16 Nov 2024
AmoebaLLM: Constructing Any-Shape Large Language Models for Efficient
  and Instant Deployment
AmoebaLLM: Constructing Any-Shape Large Language Models for Efficient and Instant Deployment
Y. Fu
Zhongzhi Yu
Junwei Li
Jiayi Qian
Yongan Zhang
Xiangchi Yuan
Dachuan Shi
Roman Yakunin
Y. Lin
24
2
0
15 Nov 2024
AMXFP4: Taming Activation Outliers with Asymmetric Microscaling
  Floating-Point for 4-bit LLM Inference
AMXFP4: Taming Activation Outliers with Asymmetric Microscaling Floating-Point for 4-bit LLM Inference
Janghwan Lee
Jiwoong Park
Jinseok Kim
Yongjik Kim
Jungju Oh
Jinwook Oh
Jungwook Choi
39
2
0
15 Nov 2024
The Super Weight in Large Language Models
The Super Weight in Large Language Models
Mengxia Yu
De Wang
Qi Shan
Colorado Reed
Alvin Wan
MQ
MILM
29
9
0
11 Nov 2024
Scaling Laws for Precision
Scaling Laws for Precision
Tanishq Kumar
Zachary Ankner
Benjamin Spector
Blake Bordelon
Niklas Muennighoff
Mansheej Paul
C. Pehlevan
Christopher Ré
Aditi Raghunathan
AIFin
MoMe
46
12
0
07 Nov 2024
Interactions Across Blocks in Post-Training Quantization of Large
  Language Models
Interactions Across Blocks in Post-Training Quantization of Large Language Models
Khasmamad Shabanovi
Lukas Wiest
Vladimir Golkov
Daniel Cremers
Thomas Pfeil
MQ
26
1
0
06 Nov 2024
The Unreasonable Effectiveness of LLMs for Query Optimization
The Unreasonable Effectiveness of LLMs for Query Optimization
Peter Akioyamen
Zixuan Yi
Ryan Marcus
29
2
0
05 Nov 2024
Stochastic Monkeys at Play: Random Augmentations Cheaply Break LLM
  Safety Alignment
Stochastic Monkeys at Play: Random Augmentations Cheaply Break LLM Safety Alignment
Jason Vega
Junsheng Huang
Gaokai Zhang
Hangoo Kang
Minjia Zhang
Gagandeep Singh
34
0
0
05 Nov 2024
DeeR-VLA: Dynamic Inference of Multimodal Large Language Models for
  Efficient Robot Execution
DeeR-VLA: Dynamic Inference of Multimodal Large Language Models for Efficient Robot Execution
Yang Yue
Yulin Wang
Bingyi Kang
Yizeng Han
Shenzhi Wang
Shiji Song
Jiashi Feng
Gao Huang
VLM
40
16
0
04 Nov 2024
Shrinking the Giant : Quasi-Weightless Transformers for Low Energy
  Inference
Shrinking the Giant : Quasi-Weightless Transformers for Low Energy Inference
Shashank Nag
Alan T. L. Bacellar
Zachary Susskind
Anshul Jha
Logan Liberty
...
Krishnan Kailas
P. Lima
Neeraja J. Yadwadkar
F. M. G. França
L. John
33
0
0
04 Nov 2024
Two-Timescale Model Caching and Resource Allocation for Edge-Enabled
  AI-Generated Content Services
Two-Timescale Model Caching and Resource Allocation for Edge-Enabled AI-Generated Content Services
Zhang Liu
Hongyang Du
Xiangwang Hou
Lianfen Huang
Seyyedali Hosseinalipour
Dusit Niyato
K. B. Letaief
DiffM
34
1
0
03 Nov 2024
HOBBIT: A Mixed Precision Expert Offloading System for Fast MoE
  Inference
HOBBIT: A Mixed Precision Expert Offloading System for Fast MoE Inference
Peng Tang
Jiacheng Liu
X. Hou
Yifei Pu
Jing Wang
Pheng-Ann Heng
C. Li
M. Guo
MoE
59
7
0
03 Nov 2024
NEO: Saving GPU Memory Crisis with CPU Offloading for Online LLM
  Inference
NEO: Saving GPU Memory Crisis with CPU Offloading for Online LLM Inference
Xuanlin Jiang
Yang Zhou
Shiyi Cao
Ion Stoica
Minlan Yu
37
8
0
02 Nov 2024
A Comprehensive Study on Quantization Techniques for Large Language
  Models
A Comprehensive Study on Quantization Techniques for Large Language Models
Jiedong Lang
Zhehao Guo
Shuyu Huang
MQ
39
10
0
30 Oct 2024
LLMCBench: Benchmarking Large Language Model Compression for Efficient
  Deployment
LLMCBench: Benchmarking Large Language Model Compression for Efficient Deployment
Ge Yang
Changyi He
J. Guo
Jianyu Wu
Yifu Ding
Aishan Liu
Haotong Qin
Pengliang Ji
Xianglong Liu
MQ
31
4
0
28 Oct 2024
NeuZip: Memory-Efficient Training and Inference with Dynamic Compression
  of Neural Networks
NeuZip: Memory-Efficient Training and Inference with Dynamic Compression of Neural Networks
Yongchang Hao
Yanshuai Cao
Lili Mou
MQ
28
2
0
28 Oct 2024
ShadowKV: KV Cache in Shadows for High-Throughput Long-Context LLM Inference
ShadowKV: KV Cache in Shadows for High-Throughput Long-Context LLM Inference
Hanshi Sun
Li-Wen Chang
Wenlei Bao
Size Zheng
Ningxin Zheng
Xin Liu
Harry Dong
Yuejie Chi
Beidi Chen
VLM
88
16
0
28 Oct 2024
Ripple: Accelerating LLM Inference on Smartphones with Correlation-Aware
  Neuron Management
Ripple: Accelerating LLM Inference on Smartphones with Correlation-Aware Neuron Management
Tuowei Wang
Ruwen Fan
Minxing Huang
Zixu Hao
Kun Li
Ting Cao
Youyou Lu
Yaoxue Zhang
Ju Ren
40
2
0
25 Oct 2024
Beware of Calibration Data for Pruning Large Language Models
Beware of Calibration Data for Pruning Large Language Models
Yixin Ji
Yang Xiang
Juntao Li
Qingrong Xia
Ping Li
Xinyu Duan
Zhefeng Wang
Min Zhang
34
2
0
23 Oct 2024
From Attention to Activation: Unravelling the Enigmas of Large Language
  Models
From Attention to Activation: Unravelling the Enigmas of Large Language Models
Prannay Kaul
Chengcheng Ma
Ismail Elezi
Jiankang Deng
26
2
0
22 Oct 2024
SDP4Bit: Toward 4-bit Communication Quantization in Sharded Data
  Parallelism for LLM Training
SDP4Bit: Toward 4-bit Communication Quantization in Sharded Data Parallelism for LLM Training
Jinda Jia
Cong Xie
Hanlin Lu
Daoce Wang
Hao Feng
...
Baixi Sun
Haibin Lin
Zhi-Li Zhang
Xin Liu
Dingwen Tao
MQ
20
3
0
20 Oct 2024
Understanding the Difficulty of Low-Precision Post-Training Quantization for LLMs
Understanding the Difficulty of Low-Precision Post-Training Quantization for LLMs
Zifei Xu
Sayeh Sharify
W. Yazar
T. Webb
Xin Eric Wang
MQ
33
0
0
18 Oct 2024
Active-Dormant Attention Heads: Mechanistically Demystifying
  Extreme-Token Phenomena in LLMs
Active-Dormant Attention Heads: Mechanistically Demystifying Extreme-Token Phenomena in LLMs
Tianyu Guo
Druv Pai
Yu Bai
Jiantao Jiao
Michael I. Jordan
Song Mei
29
9
0
17 Oct 2024
Harnessing Your DRAM and SSD for Sustainable and Accessible LLM
  Inference with Mixed-Precision and Multi-level Caching
Harnessing Your DRAM and SSD for Sustainable and Accessible LLM Inference with Mixed-Precision and Multi-level Caching
Jie Peng
Zhang Cao
Huaizhi Qu
Zhengyu Zhang
Chang Guo
Yanyong Zhang
Zhichao Cao
Tianlong Chen
29
2
0
17 Oct 2024
AERO: Softmax-Only LLMs for Efficient Private Inference
AERO: Softmax-Only LLMs for Efficient Private Inference
N. Jha
Brandon Reagen
27
1
0
16 Oct 2024
Channel-Wise Mixed-Precision Quantization for Large Language Models
Channel-Wise Mixed-Precision Quantization for Large Language Models
Zihan Chen
Bike Xie
Jundong Li
Cong Shen
MQ
22
2
0
16 Oct 2024
Scaling laws for post-training quantized large language models
Scaling laws for post-training quantized large language models
Zifei Xu
Alexander Lan
W. Yazar
T. Webb
Sayeh Sharify
Xin Eric Wang
MQ
26
0
0
15 Oct 2024
Sorted Weight Sectioning for Energy-Efficient Unstructured Sparse DNNs
  on Compute-in-Memory Crossbars
Sorted Weight Sectioning for Energy-Efficient Unstructured Sparse DNNs on Compute-in-Memory Crossbars
Matheus Farias
H. T. Kung
15
1
0
15 Oct 2024
Error Diffusion: Post Training Quantization with Block-Scaled Number
  Formats for Neural Networks
Error Diffusion: Post Training Quantization with Block-Scaled Number Formats for Neural Networks
Alireza Khodamoradi
K. Denolf
Eric Dellinger
MQ
21
0
0
15 Oct 2024
QSpec: Speculative Decoding with Complementary Quantization Schemes
QSpec: Speculative Decoding with Complementary Quantization Schemes
Juntao Zhao
Wenhao Lu
Sheng Wang
Lingpeng Kong
Chuan Wu
MQ
62
5
0
15 Oct 2024
TemporalBench: Benchmarking Fine-grained Temporal Understanding for
  Multimodal Video Models
TemporalBench: Benchmarking Fine-grained Temporal Understanding for Multimodal Video Models
Mu Cai
Reuben Tan
Jianrui Zhang
Bocheng Zou
Kai Zhang
...
Yao Dou
J. Park
Jianfeng Gao
Yong Jae Lee
Jianwei Yang
42
12
0
14 Oct 2024
ControlMM: Controllable Masked Motion Generation
ControlMM: Controllable Masked Motion Generation
Ekkasit Pinyoanuntapong
Muhammad Usama Saleem
Korrawe Karunratanakul
Pu Wang
Hongfei Xue
C. L. P. Chen
Chuan Guo
Junli Cao
J. Ren
Sergey Tulyakov
VGen
29
4
0
14 Oct 2024
SLaNC: Static LayerNorm Calibration
SLaNC: Static LayerNorm Calibration
Mahsa Salmani
Nikita Trukhanov
I. Soloveychik
MQ
26
0
0
14 Oct 2024
big.LITTLE Vision Transformer for Efficient Visual Recognition
big.LITTLE Vision Transformer for Efficient Visual Recognition
He Guo
Yulong Wang
Zixuan Ye
Jifeng Dai
Yuwen Xiong
ViT
50
0
0
14 Oct 2024
Previous
123456...91011
Next