ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2111.14826
  4. Cited By
Nonuniform-to-Uniform Quantization: Towards Accurate Quantization via
  Generalized Straight-Through Estimation

Nonuniform-to-Uniform Quantization: Towards Accurate Quantization via Generalized Straight-Through Estimation

29 November 2021
Zechun Liu
Kwang-Ting Cheng
Dong Huang
Eric P. Xing
Zhiqiang Shen
    MQ
ArXivPDFHTML

Papers citing "Nonuniform-to-Uniform Quantization: Towards Accurate Quantization via Generalized Straight-Through Estimation"

19 / 19 papers shown
Title
BackSlash: Rate Constrained Optimized Training of Large Language Models
BackSlash: Rate Constrained Optimized Training of Large Language Models
Jun Wu
Jiangtao Wen
Yuxing Han
34
0
0
23 Apr 2025
LowRA: Accurate and Efficient LoRA Fine-Tuning of LLMs under 2 Bits
LowRA: Accurate and Efficient LoRA Fine-Tuning of LLMs under 2 Bits
Zikai Zhou
Qizheng Zhang
Hermann Kumbong
Kunle Olukotun
MQ
212
0
0
12 Feb 2025
COAT: Compressing Optimizer states and Activation for Memory-Efficient FP8 Training
COAT: Compressing Optimizer states and Activation for Memory-Efficient FP8 Training
Haocheng Xi
Han Cai
Ligeng Zhu
Y. Lu
Kurt Keutzer
Jianfei Chen
Song Han
MQ
63
9
0
25 Oct 2024
MoH: Multi-Head Attention as Mixture-of-Head Attention
MoH: Multi-Head Attention as Mixture-of-Head Attention
Peng Jin
Bo Zhu
Li Yuan
Shuicheng Yan
MoE
29
13
0
15 Oct 2024
Compensate Quantization Errors+: Quantized Models Are Inquisitive Learners
Compensate Quantization Errors+: Quantized Models Are Inquisitive Learners
Yifei Gao
Jie Ou
Lei Wang
Fanhua Shang
Jaji Wu
MQ
45
0
0
22 Jul 2024
SpikeLLM: Scaling up Spiking Neural Network to Large Language Models via Saliency-based Spiking
SpikeLLM: Scaling up Spiking Neural Network to Large Language Models via Saliency-based Spiking
Xingrun Xing
Boyan Gao
Zheng Zhang
David A. Clifton
Shitao Xiao
LI DU
Guoqi Li
Jiajun Zhang
50
5
0
05 Jul 2024
LCQ: Low-Rank Codebook based Quantization for Large Language Models
LCQ: Low-Rank Codebook based Quantization for Large Language Models
Wen-Pu Cai
Wu-Jun Li
Wu-Jun Li
MQ
38
0
0
31 May 2024
Torch2Chip: An End-to-end Customizable Deep Neural Network Compression
  and Deployment Toolkit for Prototype Hardware Accelerator Design
Torch2Chip: An End-to-end Customizable Deep Neural Network Compression and Deployment Toolkit for Prototype Hardware Accelerator Design
Jian Meng
Yuan Liao
Anupreetham Anupreetham
Ahmed Hassan
Shixing Yu
Han-Sok Suh
Xiaofeng Hu
Jae-sun Seo
MQ
47
1
0
02 May 2024
Quantized Feature Distillation for Network Quantization
Quantized Feature Distillation for Network Quantization
Kevin Zhu
Yin He
Jianxin Wu
MQ
24
9
0
20 Jul 2023
MobileNMT: Enabling Translation in 15MB and 30ms
MobileNMT: Enabling Translation in 15MB and 30ms
Ye Lin
Xiaohui Wang
Zhexi Zhang
Mingxuan Wang
Tong Xiao
Jingbo Zhu
MQ
25
1
0
07 Jun 2023
LLM-QAT: Data-Free Quantization Aware Training for Large Language Models
LLM-QAT: Data-Free Quantization Aware Training for Large Language Models
Zechun Liu
Barlas Oğuz
Changsheng Zhao
Ernie Chang
Pierre Stock
Yashar Mehdad
Yangyang Shi
Raghuraman Krishnamoorthi
Vikas Chandra
MQ
46
187
0
29 May 2023
End-to-end codesign of Hessian-aware quantized neural networks for FPGAs
  and ASICs
End-to-end codesign of Hessian-aware quantized neural networks for FPGAs and ASICs
Javier Campos
Zhen Dong
Javier Mauricio Duarte
A. Gholami
Michael W. Mahoney
Jovan Mitrevski
Nhan Tran
MQ
24
3
0
13 Apr 2023
Oscillation-free Quantization for Low-bit Vision Transformers
Oscillation-free Quantization for Low-bit Vision Transformers
Shi Liu
Zechun Liu
Kwang-Ting Cheng
MQ
13
34
0
04 Feb 2023
Efficient and Effective Methods for Mixed Precision Neural Network
  Quantization for Faster, Energy-efficient Inference
Efficient and Effective Methods for Mixed Precision Neural Network Quantization for Faster, Energy-efficient Inference
Deepika Bablani
J. McKinstry
S. K. Esser
R. Appuswamy
D. Modha
MQ
15
4
0
30 Jan 2023
Exploiting the Partly Scratch-off Lottery Ticket for Quantization-Aware
  Training
Exploiting the Partly Scratch-off Lottery Ticket for Quantization-Aware Training
Yunshan Zhong
Gongrui Nan
Yu-xin Zhang
Fei Chao
Rongrong Ji
MQ
18
3
0
12 Nov 2022
Sharpness-aware Quantization for Deep Neural Networks
Sharpness-aware Quantization for Deep Neural Networks
Jing Liu
Jianfei Cai
Bohan Zhuang
MQ
21
24
0
24 Nov 2021
TACC: A Full-stack Cloud Computing Infrastructure for Machine Learning
  Tasks
TACC: A Full-stack Cloud Computing Infrastructure for Machine Learning Tasks
Kaiqiang Xu
Xinchen Wan
Hao Wang
Zhenghang Ren
Xudong Liao
D. Sun
Chaoliang Zeng
Kai Chen
49
33
0
04 Oct 2021
Differentiable Model Compression via Pseudo Quantization Noise
Differentiable Model Compression via Pseudo Quantization Noise
Alexandre Défossez
Yossi Adi
Gabriel Synnaeve
DiffM
MQ
10
46
0
20 Apr 2021
MobileNets: Efficient Convolutional Neural Networks for Mobile Vision
  Applications
MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications
Andrew G. Howard
Menglong Zhu
Bo Chen
Dmitry Kalenichenko
Weijun Wang
Tobias Weyand
M. Andreetto
Hartwig Adam
3DH
950
20,561
0
17 Apr 2017
1