Revisiting Block-based Quantisation: What is Important for Sub-8-bit LLM
Inference?

v1v2 (latest)

Revisiting Block-based Quantisation: What is Important for Sub-8-bit LLM Inference?

Conference on Empirical Methods in Natural Language Processing (EMNLP), 2023

8 October 2023

George A. Constantinides

ArXiv (abs)PDF HTML

Papers citing "Revisiting Block-based Quantisation: What is Important for Sub-8-bit LLM Inference?"

8 / 8 papers shown

Title
MX+: Pushing the Limits of Microscaling Formats for Efficient Large Language Model Serving Jungi Lee Junyong Park Soohyun Cha Jaehoon Cho Jaewoong Sim 88 0 0 16 Oct 2025
Exploring and Reshaping the Weight Distribution in LLM Chunming Ye Songzhou Li Xu Xu 141 0 0 24 Aug 2025
Knowledge Distillation of Domain-adapted LLMs for Question-Answering in Telecom Rishika Sen Sujoy Roychowdhury Sumit Soman H. G. Ranjani Srikhetra Mohanty 327 1 0 28 Apr 2025
Scaling Laws for Floating Point Quantization Training Xingwu Sun Shuaipeng Li Ruobing Xie Weidong Han Kan Wu ... Yangyu Tao Zhanhui Kang C. Xu Di Wang Jie Jiang MQ AIFin 424 5 0 05 Jan 2025
Scaling Laws For Mixed Quantization Zeyu Cao Boyang Gu Cheng Zhang Pedro Gimenes Jianqiao Lu Jianyi Cheng Xitong Gao Yiren Zhao MQ 289 1 0 09 Oct 2024
Exploring FPGA designs for MX and beyond Ebby Samson Naveen Mellempudi Wayne Luk George A. Constantinides MQ 134 4 0 01 Jul 2024
Is Temperature the Creativity Parameter of Large Language Models? Max Peeperkorn Tom Kouwenhoven Daniel G. Brown Anna K. Jordanous 201 100 0 01 May 2024
LQER: Low-Rank Quantization Error Reconstruction for LLMs Cheng Zhang Jianyi Cheng George A. Constantinides Yiren Zhao MQ 390 23 0 04 Feb 2024