v1v2 (latest)

Revisiting Block-based Quantisation: What is Important for Sub-8-bit LLM Inference?

Conference on Empirical Methods in Natural Language Processing (EMNLP), 2023

8 October 2023

Cheng Zhang

Jianyi Cheng

Ilia Shumailov

George A. Constantinides

Yiren Zhao

ArXiv (abs)PDF HTML

Papers citing "Revisiting Block-based Quantisation: What is Important for Sub-8-bit LLM Inference?"

8 / 8 papers shown

MX+: Pushing the Limits of Microscaling Formats for Efficient Large Language Model Serving

108

16 Oct 2025

Exploring and Reshaping the Weight Distribution in LLM

Chunming Ye

Songzhou Li

Xu Xu

161

24 Aug 2025

Knowledge Distillation of Domain-adapted LLMs for Question-Answering in Telecom

372

28 Apr 2025

Scaling Laws for Floating Point Quantization Training

...

466

05 Jan 2025

Scaling Laws For Mixed Quantization

339

09 Oct 2024

Exploring FPGA designs for MX and beyond

Ebby Samson

Naveen Mellempudi

Wayne Luk

George A. Constantinides

155

01 Jul 2024

Is Temperature the Creativity Parameter of Large Language Models?

234

101

01 May 2024

LQER: Low-Rank Quantization Error Reconstruction for LLMs

Cheng Zhang

Jianyi Cheng

George A. Constantinides

Yiren Zhao

424

04 Feb 2024