Intriguing Properties of Quantization at Scale

Intriguing Properties of Quantization at Scale

30 May 2023

Bharat Venkitesh

Papers citing "Intriguing Properties of Quantization at Scale"

14 / 14 papers shown

Title
Fast and Low-Cost Genomic Foundation Models via Outlier Removal Haozheng Luo Chenghao Qiu Maojiang Su Zhihan Zhou Zoe Mehta Guo Ye Jerry Yao-Chieh Hu Han Liu AAML 55 0 0 01 May 2025
Precision Where It Matters: A Novel Spike Aware Mixed-Precision Quantization Strategy for LLaMA-based Language Models Lucas Maisonnave Cyril Moineau Olivier Bichler Fabrice Rastello MQ 69 1 0 30 Apr 2025
Oaken: Fast and Efficient LLM Serving with Online-Offline Hybrid KV Cache Quantization Minsu Kim Seongmin Hong RyeoWook Ko S. Choi Hunjong Lee Junsoo Kim J. Kim Jongse Park 52 0 0 24 Mar 2025
Are formal and functional linguistic mechanisms dissociated in language models? Michael Hanna Sandro Pezzelle Yonatan Belinkov 45 0 0 14 Mar 2025
$u-$\mu$P: The Unit-Scaled Maximal Update Parametrization$ u- $\mu$ P: The Unit-Scaled Maximal Update Parametrization Charlie Blake C. Eichenberg Josef Dean Lukas Balles Luke Y. Prince Bjorn Deiseroth Andres Felipe Cruz Salinas Carlo Luschi Samuel Weinbach Douglas Orr 51 9 0 24 Jul 2024
A Simple and Effective Pruning Approach for Large Language Models Mingjie Sun Zhuang Liu Anna Bair J. Zico Kolter 35 350 0 20 Jun 2023
FlexGen: High-Throughput Generative Inference of Large Language Models with a Single GPU Ying Sheng Lianmin Zheng Binhang Yuan Zhuohan Li Max Ryabinin ... Joseph E. Gonzalez Percy Liang Christopher Ré Ion Stoica Ce Zhang 144 365 0 13 Mar 2023
GLM-130B: An Open Bilingual Pre-trained Model Aohan Zeng Xiao Liu Zhengxiao Du Zihan Wang Hanyu Lai ... Jidong Zhai Wenguang Chen Peng-Zhen Zhang Yuxiao Dong Jie Tang BDL LRM 242 1,070 0 05 Oct 2022
Outliers Dimensions that Disrupt Transformers Are Driven by Frequency Giovanni Puccetti Anna Rogers Aleksandr Drozd F. Dell’Orletta 71 42 0 23 May 2022
The Low-Resource Double Bind: An Empirical Study of Pruning for Low-Resource Machine Translation Orevaoghene Ahia Julia Kreutzer Sara Hooker 107 46 0 06 Oct 2021
Train Short, Test Long: Attention with Linear Biases Enables Input Length Extrapolation Ofir Press Noah A. Smith M. Lewis 242 690 0 27 Aug 2021
Scaling Laws for Neural Language Models Jared Kaplan Sam McCandlish T. Henighan Tom B. Brown B. Chess R. Child Scott Gray Alec Radford Jeff Wu Dario Amodei 226 4,424 0 23 Jan 2020
MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications Andrew G. Howard Menglong Zhu Bo Chen Dmitry Kalenichenko Weijun Wang Tobias Weyand M. Andreetto Hartwig Adam 3DH 948 20,214 0 17 Apr 2017
Improving neural networks by preventing co-adaptation of feature detectors Geoffrey E. Hinton Nitish Srivastava A. Krizhevsky Ilya Sutskever Ruslan Salakhutdinov VLM 243 7,597 0 03 Jul 2012