ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2302.08007
  4. Cited By
With Shared Microexponents, A Little Shifting Goes a Long Way

With Shared Microexponents, A Little Shifting Goes a Long Way

16 February 2023
Bita Darvish Rouhani
Ritchie Zhao
V. Elango
Rasoul Shafipour
Mathew Hall
Maral Mesmakhosroshahi
Ankit More
Levi Melnick
Maximilian Golub
G. Varatkar
Lai Shao
Gaurav Kolhe
Dimitry Melts
Jasmine Klar
Renee L'Heureux
Matt Perry
Doug Burger
Eric S. Chung
Zhaoxia Deng
S. Naghshineh
Jongsoo Park
Maxim Naumov
    MQ
ArXivPDFHTML

Papers citing "With Shared Microexponents, A Little Shifting Goes a Long Way"

30 / 30 papers shown
Title
MixDiT: Accelerating Image Diffusion Transformer Inference with Mixed-Precision MX Quantization
MixDiT: Accelerating Image Diffusion Transformer Inference with Mixed-Precision MX Quantization
Daeun Kim
Jinwoo Hwang
Changhun Oh
Jongse Park
MQ
35
0
0
11 Apr 2025
Key, Value, Compress: A Systematic Exploration of KV Cache Compression Techniques
Key, Value, Compress: A Systematic Exploration of KV Cache Compression Techniques
Neusha Javidnia
B. Rouhani
F. Koushanfar
49
0
0
14 Mar 2025
BCQ: Block Clustered Quantization for 4-bit (W4A4) LLM Inference
Reena Elangovan
Charbel Sakr
A. Raghunathan
Brucek Khailany
MQ
38
1
0
07 Feb 2025
Anda: Unlocking Efficient LLM Inference with a Variable-Length Grouped
  Activation Data Format
Anda: Unlocking Efficient LLM Inference with a Variable-Length Grouped Activation Data Format
Chao Fang
Man Shi
Robin Geens
Arne Symons
Zhongfeng Wang
Marian Verhelst
69
0
0
24 Nov 2024
BitMoD: Bit-serial Mixture-of-Datatype LLM Acceleration
Yuzong Chen
Ahmed F. AbouElhamayed
Xilai Dai
Yang Wang
Marta Andronic
G. Constantinides
Mohamed S. Abdelfattah
MQ
93
0
0
18 Nov 2024
Error Diffusion: Post Training Quantization with Block-Scaled Number
  Formats for Neural Networks
Error Diffusion: Post Training Quantization with Block-Scaled Number Formats for Neural Networks
Alireza Khodamoradi
K. Denolf
Eric Dellinger
MQ
18
0
0
15 Oct 2024
SLaNC: Static LayerNorm Calibration
SLaNC: Static LayerNorm Calibration
Mahsa Salmani
Nikita Trukhanov
I. Soloveychik
MQ
23
0
0
14 Oct 2024
Scaling Laws for Mixed quantization in Large Language Models
Scaling Laws for Mixed quantization in Large Language Models
Zeyu Cao
Cheng Zhang
Pedro Gimenes
Jianqiao Lu
Jianyi Cheng
Yiren Zhao
MQ
29
1
0
09 Oct 2024
QERA: an Analytical Framework for Quantization Error Reconstruction
QERA: an Analytical Framework for Quantization Error Reconstruction
Cheng Zhang
Jeffrey T. H. Wong
Can Xiao
G. Constantinides
Yiren Zhao
MQ
35
0
0
08 Oct 2024
u-$\mu$P: The Unit-Scaled Maximal Update Parametrization
u-μ\muμP: The Unit-Scaled Maximal Update Parametrization
Charlie Blake
C. Eichenberg
Josef Dean
Lukas Balles
Luke Y. Prince
Bjorn Deiseroth
Andres Felipe Cruz Salinas
Carlo Luschi
Samuel Weinbach
Douglas Orr
46
9
0
24 Jul 2024
Accelerating Communication in Deep Learning Recommendation Model
  Training with Dual-Level Adaptive Lossy Compression
Accelerating Communication in Deep Learning Recommendation Model Training with Dual-Level Adaptive Lossy Compression
Hao Feng
Boyuan Zhang
Fanjiang Ye
Min Si
Ching-Hsiang Chu
...
Summer Deng
Yuchen Hao
Pavan Balaji
Tong Geng
Dingwen Tao
AI4CE
21
2
0
05 Jul 2024
SDQ: Sparse Decomposed Quantization for LLM Inference
SDQ: Sparse Decomposed Quantization for LLM Inference
Geonhwa Jeong
Po-An Tsai
S. Keckler
Tushar Krishna
MQ
30
3
0
19 Jun 2024
Tender: Accelerating Large Language Models via Tensor Decomposition and
  Runtime Requantization
Tender: Accelerating Large Language Models via Tensor Decomposition and Runtime Requantization
Jungi Lee
Wonbeom Lee
Jaewoong Sim
MQ
18
14
0
16 Jun 2024
Effective Interplay between Sparsity and Quantization: From Theory to Practice
Effective Interplay between Sparsity and Quantization: From Theory to Practice
Simla Burcu Harma
Ayan Chakraborty
Elizaveta Kostenok
Danila Mishin
Dongho Ha
...
Martin Jaggi
Ming Liu
Yunho Oh
Suvinay Subramanian
Amir Yazdanbakhsh
MQ
22
4
0
31 May 2024
Learning from Students: Applying t-Distributions to Explore Accurate and
  Efficient Formats for LLMs
Learning from Students: Applying t-Distributions to Explore Accurate and Efficient Formats for LLMs
Jordan Dotzel
Yuzong Chen
Bahaa Kotb
Sushma Prasad
Gang Wu
Sheng R. Li
Mohamed S. Abdelfattah
Zhiru Zhang
19
7
0
06 May 2024
DaCapo: Accelerating Continuous Learning in Autonomous Systems for Video
  Analytics
DaCapo: Accelerating Continuous Learning in Autonomous Systems for Video Analytics
Yoonsung Kim
Changhun Oh
Jinwoo Hwang
Wonung Kim
Seongryong Oh
Yubin Lee
Hardik Sharma
Amir Yazdanbakhsh
Jongse Park
25
7
0
21 Mar 2024
LLM Inference Unveiled: Survey and Roofline Model Insights
LLM Inference Unveiled: Survey and Roofline Model Insights
Zhihang Yuan
Yuzhang Shang
Yang Zhou
Zhen Dong
Zhe Zhou
...
Yong Jae Lee
Yan Yan
Beidi Chen
Guangyu Sun
Kurt Keutzer
37
77
0
26 Feb 2024
GPTVQ: The Blessing of Dimensionality for LLM Quantization
GPTVQ: The Blessing of Dimensionality for LLM Quantization
M. V. Baalen
Andrey Kuzmin
Markus Nagel
Peter Couperus
Cédric Bastoul
E. Mahurin
Tijmen Blankevoort
Paul N. Whatmough
MQ
26
28
0
23 Feb 2024
LQER: Low-Rank Quantization Error Reconstruction for LLMs
LQER: Low-Rank Quantization Error Reconstruction for LLMs
Cheng Zhang
Jianyi Cheng
G. Constantinides
Yiren Zhao
MQ
16
8
0
04 Feb 2024
Scaling Down to Scale Up: A Cost-Benefit Analysis of Replacing OpenAI's
  LLM with Open Source SLMs in Production
Scaling Down to Scale Up: A Cost-Benefit Analysis of Replacing OpenAI's LLM with Open Source SLMs in Production
Chandra Irugalbandara
Ashish Mahendra
Roland Daynauth
T. Arachchige
Jayanaka L. Dantanarayana
K. Flautner
Lingjia Tang
Yiping Kang
Jason Mars
ELM
21
14
0
20 Dec 2023
ESPN: Memory-Efficient Multi-Vector Information Retrieval
ESPN: Memory-Efficient Multi-Vector Information Retrieval
Susav Shrestha
Narasimha Reddy
Zongwang Li
17
5
0
09 Dec 2023
Just-in-time Quantization with Processing-In-Memory for Efficient ML
  Training
Just-in-time Quantization with Processing-In-Memory for Efficient ML Training
M. Ibrahim
Shaizeen Aga
Ada Li
Suchita Pati
Mahzabeen Islam
21
2
0
08 Nov 2023
Microscaling Data Formats for Deep Learning
Microscaling Data Formats for Deep Learning
B. Rouhani
Ritchie Zhao
Ankit More
Mathew Hall
Alireza Khodamoradi
...
Maxim Naumov
Colin Verilli
Ralph Wittig
Doug Burger
Eric S. Chung
MQ
16
16
0
16 Oct 2023
FP8 versus INT8 for efficient deep learning inference
FP8 versus INT8 for efficient deep learning inference
M. V. Baalen
Andrey Kuzmin
Suparna S. Nair
Yuwei Ren
E. Mahurin
...
Sundar Subramanian
Sanghyuk Lee
Markus Nagel
Joseph B. Soriaga
Tijmen Blankevoort
MQ
13
43
0
31 Mar 2023
Accuracy Booster: Enabling 4-bit Fixed-point Arithmetic for DNN Training
Accuracy Booster: Enabling 4-bit Fixed-point Arithmetic for DNN Training
Simla Burcu Harma
Canberk Sonmez
Nicholas Sperry
Babak Falsafi
Martin Jaggi
Yunho Oh
MQ
13
4
0
19 Nov 2022
FP8 Formats for Deep Learning
FP8 Formats for Deep Learning
Paulius Micikevicius
Dusan Stosic
N. Burgess
Marius Cornea
Pradeep Dubey
...
Naveen Mellempudi
S. Oberman
M. Shoeybi
Michael Siu
Hao Wu
BDL
VLM
MQ
65
119
0
12 Sep 2022
Schrödinger's FP: Dynamic Adaptation of Floating-Point Containers for
  Deep Learning Training
Schrödinger's FP: Dynamic Adaptation of Floating-Point Containers for Deep Learning Training
Milovs Nikolić
Enrique Torres Sanchez
Jia-Hui Wang
Ali Hadi Zadeh
Mostafa Mahmoud
Ameer Abdelhadi
Kareem Ibrahim
Andreas Moshovos
MQ
12
1
0
28 Apr 2022
Scaling Laws for Neural Language Models
Scaling Laws for Neural Language Models
Jared Kaplan
Sam McCandlish
T. Henighan
Tom B. Brown
B. Chess
R. Child
Scott Gray
Alec Radford
Jeff Wu
Dario Amodei
220
4,424
0
23 Jan 2020
Q-BERT: Hessian Based Ultra Low Precision Quantization of BERT
Q-BERT: Hessian Based Ultra Low Precision Quantization of BERT
Sheng Shen
Zhen Dong
Jiayu Ye
Linjian Ma
Z. Yao
A. Gholami
Michael W. Mahoney
Kurt Keutzer
MQ
214
571
0
12 Sep 2019
Google's Neural Machine Translation System: Bridging the Gap between
  Human and Machine Translation
Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation
Yonghui Wu
M. Schuster
Z. Chen
Quoc V. Le
Mohammad Norouzi
...
Alex Rudnick
Oriol Vinyals
G. Corrado
Macduff Hughes
J. Dean
AIMat
716
6,435
0
26 Sep 2016
1