ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2102.04503
  4. Cited By
VS-Quant: Per-vector Scaled Quantization for Accurate Low-Precision
  Neural Network Inference

VS-Quant: Per-vector Scaled Quantization for Accurate Low-Precision Neural Network Inference

Conference on Machine Learning and Systems (MLSys), 2021
8 February 2021
Steve Dai
Rangharajan Venkatesan
Haoxing Ren
B. Zimmer
W. Dally
Brucek Khailany
    MQ
ArXiv (abs)PDFHTML

Papers citing "VS-Quant: Per-vector Scaled Quantization for Accurate Low-Precision Neural Network Inference"

40 / 40 papers shown
Mixed-Precision Quantization for Language Models: Techniques and Prospects
Mixed-Precision Quantization for Language Models: Techniques and Prospects
M. Rakka
Marios Fournarakis
Olga Krestinskaya
Jinane Bazzi
K. Salama
Fadi J. Kurdahi
A. Eltawil
M. Fouda
MQ
294
2
0
19 Oct 2025
MX+: Pushing the Limits of Microscaling Formats for Efficient Large Language Model Serving
MX+: Pushing the Limits of Microscaling Formats for Efficient Large Language Model Serving
Jungi Lee
Junyong Park
Soohyun Cha
Jaehoon Cho
Jaewoong Sim
156
4
0
16 Oct 2025
Attribute Filtering in Approximate Nearest Neighbor Search: An In-depth Experimental Study
Attribute Filtering in Approximate Nearest Neighbor Search: An In-depth Experimental Study
Mocheng Li
Xiao Yan
Baotong Lu
Yue Zhang
James Cheng
Chenhao Ma
166
7
0
22 Aug 2025
Neural Network Quantization for Microcontrollers: A Comprehensive Survey of Methods, Platforms, and Applications
Neural Network Quantization for Microcontrollers: A Comprehensive Survey of Methods, Platforms, and Applications
Hamza A. Abushahla
Dara Varam
Ariel J. N. Panopio
Mohamed I. AlHajri
MQ
476
1
0
20 Aug 2025
A Segmented Robot Grasping Perception Neural Network for Edge AI
A Segmented Robot Grasping Perception Neural Network for Edge AI
Casper Bröcheler
Thomas Vroom
Derrick Timmermans
Alan van den Akker
Guangzhi Tang
Charalampos Kouzinopoulos
Rico Mockel
227
1
0
18 Jul 2025
Recipes for Pre-training LLMs with MXFP8
Asit K. Mishra
Dusan Stosic
Simon Layton
Paulius Micikevicius
MQ
288
12
0
30 May 2025
FGMP: Fine-Grained Mixed-Precision Weight and Activation Quantization for Hardware-Accelerated LLM Inference
FGMP: Fine-Grained Mixed-Precision Weight and Activation Quantization for Hardware-Accelerated LLM Inference
Coleman Hooper
Charbel Sakr
Ben Keller
Rangharajan Venkatesan
Kurt Keutzer
Siyang Song
Brucek Khailany
MQ
351
6
0
19 Apr 2025
LO-BCQ: Block Clustered Quantization for 4-bit (W4A4) LLM Inference
LO-BCQ: Block Clustered Quantization for 4-bit (W4A4) LLM Inference
Reena Elangovan
Charbel Sakr
A. Raghunathan
Brucek Khailany
MQ
407
4
0
07 Feb 2025
SQ-DM: Accelerating Diffusion Models with Aggressive Quantization and Temporal Sparsity
SQ-DM: Accelerating Diffusion Models with Aggressive Quantization and Temporal SparsityDesign Automation Conference (DAC), 2025
Zichen Fan
Steve Dai
Rangharajan Venkatesan
Dennis Sylvester
Brucek Khailany
MQ
384
2
0
28 Jan 2025
Anda: Unlocking Efficient LLM Inference with a Variable-Length Grouped
  Activation Data Format
Anda: Unlocking Efficient LLM Inference with a Variable-Length Grouped Activation Data FormatInternational Symposium on High-Performance Computer Architecture (HPCA), 2024
Chao Fang
Man Shi
Robin Geens
Arne Symons
Zhongfeng Wang
Marian Verhelst
503
15
0
24 Nov 2024
BitMoD: Bit-serial Mixture-of-Datatype LLM AccelerationInternational Symposium on High-Performance Computer Architecture (HPCA), 2024
Yuzong Chen
Ahmed F. AbouElhamayed
Xilai Dai
Yang Wang
Marta Andronic
George A. Constantinides
Mohamed S. Abdelfattah
MQ
536
25
0
18 Nov 2024
COAT: Compressing Optimizer states and Activation for Memory-Efficient FP8 Training
COAT: Compressing Optimizer states and Activation for Memory-Efficient FP8 TrainingInternational Conference on Learning Representations (ICLR), 2024
Haocheng Xi
Han Cai
Ligeng Zhu
Yaojie Lu
Kurt Keutzer
Jianfei Chen
Song Han
MQ
625
26
0
25 Oct 2024
Scaling Laws For Mixed Quantization
Scaling Laws For Mixed Quantization
Zeyu Cao
Boyang Gu
Cheng Zhang
Pedro Gimenes
Jianqiao Lu
Jianyi Cheng
Xitong Gao
Yiren Zhao
MQ
429
1
0
09 Oct 2024
A method of using RSVD in residual calculation of LowBit GEMM
A method of using RSVD in residual calculation of LowBit GEMM
Hongyaoxing Gu
MQ
268
0
0
27 Sep 2024
Robust Training of Neural Networks at Arbitrary Precision and Sparsity
Robust Training of Neural Networks at Arbitrary Precision and Sparsity
Chengxi Ye
Grace Chu
Yanfeng Liu
Yichi Zhang
Lukasz Lew
Li Zhang
Mark Sandler
Andrew G. Howard
MQ
238
2
0
14 Sep 2024
Exploring FPGA designs for MX and beyond
Exploring FPGA designs for MX and beyond
Ebby Samson
Naveen Mellempudi
Wayne Luk
George A. Constantinides
MQ
216
6
0
01 Jul 2024
Joint Pruning and Channel-wise Mixed-Precision Quantization for
  Efficient Deep Neural Networks
Joint Pruning and Channel-wise Mixed-Precision Quantization for Efficient Deep Neural Networks
Beatrice Alessandra Motetti
Matteo Risso
Luca Bompani
Enrico Macii
Massimo Poncino
Daniele Jahier Pagliari
MQ
347
13
0
01 Jul 2024
SDQ: Sparse Decomposed Quantization for LLM Inference
SDQ: Sparse Decomposed Quantization for LLM Inference
Geonhwa Jeong
Po-An Tsai
S. Keckler
Tushar Krishna
MQ
200
4
0
19 Jun 2024
Effective Interplay between Sparsity and Quantization: From Theory to Practice
Effective Interplay between Sparsity and Quantization: From Theory to Practice
Simla Burcu Harma
Ayan Chakraborty
Elizaveta Kostenok
Danila Mishin
Dongho Ha
...
Martin Jaggi
Ming Liu
Yunho Oh
Suvinay Subramanian
Amir Yazdanbakhsh
MQ
454
22
0
31 May 2024
Learning from Students: Applying t-Distributions to Explore Accurate and
  Efficient Formats for LLMs
Learning from Students: Applying t-Distributions to Explore Accurate and Efficient Formats for LLMsInternational Conference on Machine Learning (ICML), 2024
Jordan Dotzel
Yuzong Chen
Bahaa Kotb
Sushma Prasad
Gang Wu
Sheng Li
Mohamed S. Abdelfattah
Zhiru Zhang
431
20
0
06 May 2024
Instance-Aware Group Quantization for Vision Transformers
Instance-Aware Group Quantization for Vision Transformers
Jaehyeon Moon
Jeimin Jeon
Junyong Cheon
Bumsub Ham
MQViT
301
18
0
01 Apr 2024
PikeLPN: Mitigating Overlooked Inefficiencies of Low-Precision Neural
  Networks
PikeLPN: Mitigating Overlooked Inefficiencies of Low-Precision Neural Networks
Marina Neseem
Conor McCullough
Randy Hsin
Chas Leichner
Shan Li
...
Andrew G. Howard
Lukasz Lew
Sherief Reda
Ville Rautio
Daniele Moro
MQ
448
6
0
29 Mar 2024
FlattenQuant: Breaking Through the Inference Compute-bound for Large
  Language Models with Per-tensor Quantization
FlattenQuant: Breaking Through the Inference Compute-bound for Large Language Models with Per-tensor Quantization
Yi Zhang
Fei Yang
Shuang Peng
Fangyu Wang
Aimin Pan
MQ
316
6
0
28 Feb 2024
Model Compression and Efficient Inference for Large Language Models: A
  Survey
Model Compression and Efficient Inference for Large Language Models: A Survey
Wenxiao Wang
Wei Chen
Yicong Luo
Yongliu Long
Zhengkai Lin
Liye Zhang
Binbin Lin
Deng Cai
Xiaofei He
MQ
379
95
0
15 Feb 2024
Microscaling Data Formats for Deep Learning
Microscaling Data Formats for Deep Learning
B. Rouhani
Ritchie Zhao
Ankit More
Mathew Hall
Alireza Khodamoradi
...
Maxim Naumov
Colin Verilli
Ralph Wittig
Doug Burger
Eric S. Chung
MQ
617
153
0
16 Oct 2023
Revisiting Block-based Quantisation: What is Important for Sub-8-bit LLM
  Inference?
Revisiting Block-based Quantisation: What is Important for Sub-8-bit LLM Inference?Conference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Cheng Zhang
Jianyi Cheng
Ilia Shumailov
George A. Constantinides
Yiren Zhao
MQ
290
15
0
08 Oct 2023
Photonic Accelerators for Image Segmentation in Autonomous Driving and
  Defect Detection
Photonic Accelerators for Image Segmentation in Autonomous Driving and Defect DetectionIEEE Conference on High Performance Extreme Computing (HPEC), 2023
Lakshmi Nair
David Widemann
Brad Turcott
Nick Moore
Alexandra Wleklinski
D. Bunandar
Ioannis Papavasileiou
Shihu Wang
Eric Logan
306
0
0
28 Sep 2023
INT-FP-QSim: Mixed Precision and Formats For Large Language Models and
  Vision Transformers
INT-FP-QSim: Mixed Precision and Formats For Large Language Models and Vision Transformers
Lakshmi Nair
Mikhail Bernadskiy
Arulselvan Madhavan
Craig Chan
Ayon Basumallik
D. Bunandar
MQ
245
2
0
07 Jul 2023
Similarity search in the blink of an eye with compressed indices
Similarity search in the blink of an eye with compressed indicesProceedings of the VLDB Endowment (PVLDB), 2023
Cecilia Aguerrebere
Ishwar Bhati
Mark Hildebrand
Mariano Tepper
Ted Willke
288
57
0
07 Apr 2023
RPTQ: Reorder-based Post-training Quantization for Large Language Models
RPTQ: Reorder-based Post-training Quantization for Large Language Models
Zhihang Yuan
Lin Niu
Jia-Wen Liu
Wenyu Liu
Xinggang Wang
Yuzhang Shang
Guangyu Sun
Qiang Wu
Jiaxiang Wu
Bingzhe Wu
MQ
652
120
0
03 Apr 2023
With Shared Microexponents, A Little Shifting Goes a Long Way
With Shared Microexponents, A Little Shifting Goes a Long WayInternational Symposium on Computer Architecture (ISCA), 2023
Bita Darvish Rouhani
Ritchie Zhao
V. Elango
Rasoul Shafipour
Mathew Hall
...
Eric S. Chung
Zhaoxia Deng
S. Naghshineh
Jongsoo Park
Maxim Naumov
MQ
365
79
0
16 Feb 2023
Hyperspherical Quantization: Toward Smaller and More Accurate Models
Hyperspherical Quantization: Toward Smaller and More Accurate ModelsIEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2022
Dan Liu
X. Chen
Chen Ma
Xue Liu
MQ
203
4
0
24 Dec 2022
Empirical Evaluation of Post-Training Quantization Methods for Language
  Tasks
Empirical Evaluation of Post-Training Quantization Methods for Language Tasks
Ting Hu
Christoph Meinel
Haojin Yang
MQ
301
4
0
29 Oct 2022
Block Format Error Bounds and Optimal Block Size Selection
Block Format Error Bounds and Optimal Block Size Selection
I. Soloveychik
I. Lyubomirsky
Xin Eric Wang
S. Bhoja
MQ
304
6
0
11 Oct 2022
Edge Inference with Fully Differentiable Quantized Mixed Precision
  Neural Networks
Edge Inference with Fully Differentiable Quantized Mixed Precision Neural NetworksIEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2022
Clemens J. S. Schaefer
Siddharth Joshi
Shane Li
Raul Blazquez
MQ
224
15
0
15 Jun 2022
Optimal Clipping and Magnitude-aware Differentiation for Improved
  Quantization-aware Training
Optimal Clipping and Magnitude-aware Differentiation for Improved Quantization-aware TrainingInternational Conference on Machine Learning (ICML), 2022
Charbel Sakr
Steve Dai
Rangharajan Venkatesan
B. Zimmer
W. Dally
Brucek Khailany
MQ
244
52
0
13 Jun 2022
Variability-Aware Training and Self-Tuning of Highly Quantized DNNs for
  Analog PIM
Variability-Aware Training and Self-Tuning of Highly Quantized DNNs for Analog PIMDesign, Automation and Test in Europe (DATE), 2021
Zihao Deng
Michael Orshansky
MQ
192
13
0
11 Nov 2021
TOD: GPU-accelerated Outlier Detection via Tensor Operations
TOD: GPU-accelerated Outlier Detection via Tensor Operations
Yue Zhao
George H. Chen
Zhihao Jia
AI4TS
374
10
0
26 Oct 2021
Pareto-Optimal Quantized ResNet Is Mostly 4-bit
Pareto-Optimal Quantized ResNet Is Mostly 4-bit
AmirAli Abdolrashidi
Lisa Wang
Shivani Agrawal
J. Malmaud
Oleg Rybakov
Chas Leichner
Lukasz Lew
MQ
248
46
0
07 May 2021
DAQ: Channel-Wise Distribution-Aware Quantization for Deep Image
  Super-Resolution Networks
DAQ: Channel-Wise Distribution-Aware Quantization for Deep Image Super-Resolution NetworksIEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2020
Chee Hong
Heewon Kim
Sungyong Baik
Junghun Oh
Kyoung Mu Lee
OODSupRMQ
375
51
0
21 Dec 2020
1
Page 1 of 1