VS-Quant: Per-vector Scaled Quantization for Accurate Low-Precision Neural Network Inference

Conference on Machine Learning and Systems (MLSys), 2021

8 February 2021

Steve Dai

Rangharajan Venkatesan

Papers citing "VS-Quant: Per-vector Scaled Quantization for Accurate Low-Precision Neural Network Inference"

40 / 40 papers shown

Mixed-Precision Quantization for Language Models: Techniques and Prospects

294

19 Oct 2025

MX+: Pushing the Limits of Microscaling Formats for Efficient Large Language Model Serving

156

16 Oct 2025

Attribute Filtering in Approximate Nearest Neighbor Search: An In-depth Experimental Study

166

22 Aug 2025

Neural Network Quantization for Microcontrollers: A Comprehensive Survey of Methods, Platforms, and Applications

476

20 Aug 2025

A Segmented Robot Grasping Perception Neural Network for Edge AI

Charalampos Kouzinopoulos

Rico Mockel

227

18 Jul 2025

Recipes for Pre-training LLMs with MXFP8

288

30 May 2025

FGMP: Fine-Grained Mixed-Precision Weight and Activation Quantization for Hardware-Accelerated LLM Inference

Coleman Hooper

Charbel Sakr

Ben Keller

Rangharajan Venkatesan

Kurt Keutzer

Siyang Song

Brucek Khailany

351

19 Apr 2025

LO-BCQ: Block Clustered Quantization for 4-bit (W4A4) LLM Inference

407

07 Feb 2025

SQ-DM: Accelerating Diffusion Models with Aggressive Quantization and Temporal SparsityDesign Automation Conference (DAC), 2025

Zichen Fan

Steve Dai

Rangharajan Venkatesan

Dennis Sylvester

Brucek Khailany

384

28 Jan 2025

Anda: Unlocking Efficient LLM Inference with a Variable-Length Grouped Activation Data FormatInternational Symposium on High-Performance Computer Architecture (HPCA), 2024

503

24 Nov 2024

BitMoD: Bit-serial Mixture-of-Datatype LLM AccelerationInternational Symposium on High-Performance Computer Architecture (HPCA), 2024

Yuzong Chen

Ahmed F. AbouElhamayed

Xilai Dai

Yang Wang

Marta Andronic

George A. Constantinides

Mohamed S. Abdelfattah

536

18 Nov 2024

COAT: Compressing Optimizer states and Activation for Memory-Efficient FP8 TrainingInternational Conference on Learning Representations (ICLR), 2024

625

25 Oct 2024

Scaling Laws For Mixed Quantization

429

09 Oct 2024

A method of using RSVD in residual calculation of LowBit GEMM

Hongyaoxing Gu

268

27 Sep 2024

Robust Training of Neural Networks at Arbitrary Precision and Sparsity

238

14 Sep 2024

Exploring FPGA designs for MX and beyond

Ebby Samson

Naveen Mellempudi

Wayne Luk

George A. Constantinides

216

01 Jul 2024

Joint Pruning and Channel-wise Mixed-Precision Quantization for Efficient Deep Neural Networks

Beatrice Alessandra Motetti

Daniele Jahier Pagliari

347

01 Jul 2024

SDQ: Sparse Decomposed Quantization for LLM Inference

Geonhwa Jeong

Po-An Tsai

S. Keckler

Tushar Krishna

200

19 Jun 2024

Effective Interplay between Sparsity and Quantization: From Theory to Practice

...

454

31 May 2024

Learning from Students: Applying t-Distributions to Explore Accurate and Efficient Formats for LLMsInternational Conference on Machine Learning (ICML), 2024

Mohamed S. Abdelfattah

Zhiru Zhang

431

06 May 2024

Instance-Aware Group Quantization for Vision Transformers

301

01 Apr 2024

PikeLPN: Mitigating Overlooked Inefficiencies of Low-Precision Neural Networks

...

448

29 Mar 2024

FlattenQuant: Breaking Through the Inference Compute-bound for Large Language Models with Per-tensor Quantization

316

28 Feb 2024

Model Compression and Efficient Inference for Large Language Models: A Survey

379

15 Feb 2024

Microscaling Data Formats for Deep Learning

...

617

153

16 Oct 2023

Revisiting Block-based Quantisation: What is Important for Sub-8-bit LLM Inference?Conference on Empirical Methods in Natural Language Processing (EMNLP), 2023

Cheng Zhang

Jianyi Cheng

Ilia Shumailov

George A. Constantinides

Yiren Zhao

290

08 Oct 2023

Photonic Accelerators for Image Segmentation in Autonomous Driving and Defect DetectionIEEE Conference on High Performance Extreme Computing (HPEC), 2023

Ioannis Papavasileiou

Shihu Wang

Eric Logan

306

28 Sep 2023

INT-FP-QSim: Mixed Precision and Formats For Large Language Models and Vision Transformers

245

07 Jul 2023

Similarity search in the blink of an eye with compressed indicesProceedings of the VLDB Endowment (PVLDB), 2023

288

07 Apr 2023

RPTQ: Reorder-based Post-training Quantization for Large Language Models

652

120

03 Apr 2023

With Shared Microexponents, A Little Shifting Goes a Long WayInternational Symposium on Computer Architecture (ISCA), 2023

...

365

16 Feb 2023

Hyperspherical Quantization: Toward Smaller and More Accurate ModelsIEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2022

Dan Liu

X. Chen

Chen Ma

Xue Liu

203

24 Dec 2022

Empirical Evaluation of Post-Training Quantization Methods for Language Tasks

Ting Hu

Christoph Meinel

Haojin Yang

301

29 Oct 2022

Block Format Error Bounds and Optimal Block Size Selection

304

11 Oct 2022

Edge Inference with Fully Differentiable Quantized Mixed Precision Neural NetworksIEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2022

Clemens J. S. Schaefer

Siddharth Joshi

Shane Li

Raul Blazquez

224

15 Jun 2022

Optimal Clipping and Magnitude-aware Differentiation for Improved Quantization-aware TrainingInternational Conference on Machine Learning (ICML), 2022

Charbel Sakr

Steve Dai

Rangharajan Venkatesan

B. Zimmer

W. Dally

Brucek Khailany

244

13 Jun 2022

Variability-Aware Training and Self-Tuning of Highly Quantized DNNs for Analog PIMDesign, Automation and Test in Europe (DATE), 2021

Zihao Deng

Michael Orshansky

192

11 Nov 2021

TOD: GPU-accelerated Outlier Detection via Tensor Operations

374

26 Oct 2021

Pareto-Optimal Quantized ResNet Is Mostly 4-bit

248

07 May 2021

DAQ: Channel-Wise Distribution-Aware Quantization for Deep Image Super-Resolution NetworksIEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2020

Kyoung Mu Lee

375

21 Dec 2020