Papers
Communities
Organizations
Events
Blog
Pricing
Feedback
Contact Sales
Search
Open menu
Home
Papers
2505.14669
Cited By
v1
v2 (latest)
Quartet: Native FP4 Training Can Be Optimal for Large Language Models
20 May 2025
Roberto L. Castro
Andrei Panferov
Soroush Tabesh
Oliver Sieberling
Jiale Chen
Mahdi Nikdan
Saleh Ashkboos
Dan Alistarh
MQ
Re-assign community
ArXiv (abs)
PDF
HTML
HuggingFace (78 upvotes)
Papers citing
"Quartet: Native FP4 Training Can Be Optimal for Large Language Models"
27 / 27 papers shown
Title
Elucidating the Design Space of FP4 training
Robert Hu
Carlo Luschi
Paul Balanca
MQ
0
0
0
22 Sep 2025
QuEST: Stable Training of LLMs with 1-Bit Weights and Activations
Andrei Panferov
Jiale Chen
Soroush Tabesh
Roberto L. Castro
Mahdi Nikdan
Dan Alistarh
MQ
135
7
0
07 Feb 2025
Optimizing Large Language Model Training Using FP4 Quantization
Ruizhe Wang
Yeyun Gong
Xiao Liu
Guoshuai Zhao
Ziyue Yang
Baining Guo
Zhengjun Zha
Peng Cheng
MQ
265
18
0
28 Jan 2025
QuaRot: Outlier-Free 4-Bit Inference in Rotated LLMs
Saleh Ashkboos
Amirkeivan Mohtashami
Maximilian L. Croci
Bo Li
Martin Jaggi
Dan Alistarh
Torsten Hoefler
James Hensman
MQ
271
225
0
30 Mar 2024
Jetfire: Efficient and Accurate Transformer Pretraining with INT8 Data Flow and Per-Block Quantization
Haocheng Xi
Yuxiang Chen
Kang Zhao
Kaijun Zheng
Jianfei Chen
Jun Zhu
MQ
142
23
0
19 Mar 2024
Beyond Chinchilla-Optimal: Accounting for Inference in Language Model Scaling Laws
Nikhil Sardana
Jacob P. Portes
Sasha Doubov
Jonathan Frankle
LRM
552
100
0
31 Dec 2023
BitNet: Scaling 1-bit Transformers for Large Language Models
Hongyu Wang
Shuming Ma
Li Dong
Shaohan Huang
Huaijie Wang
Lingxiao Ma
Fan Yang
Ruiping Wang
Yi Wu
Furu Wei
MQ
118
138
0
17 Oct 2023
Scaling Laws for Sparsely-Connected Foundation Models
Elias Frantar
C. Riquelme
N. Houlsby
Dan Alistarh
Utku Evci
138
43
0
15 Sep 2023
QuIP: 2-Bit Quantization of Large Language Models With Guarantees
Jerry Chee
Yaohui Cai
Volodymyr Kuleshov
Chris De Sa
MQ
215
250
0
25 Jul 2023
Training Transformers with 4-bit Integers
Haocheng Xi
Changhao Li
Jianfei Chen
Jun Zhu
MQ
158
56
0
21 Jun 2023
Stable and low-precision training for large-scale vision-language models
Mitchell Wortsman
Tim Dettmers
Luke Zettlemoyer
Ari S. Morcos
Ali Farhadi
Ludwig Schmidt
MQ
MLLM
VLM
186
50
0
25 Apr 2023
LLM.int8(): 8-bit Matrix Multiplication for Transformers at Scale
Tim Dettmers
M. Lewis
Younes Belkada
Luke Zettlemoyer
MQ
274
709
0
15 Aug 2022
Accurate Neural Training with 4-bit Matrix Multiplications at Standard Formats
Brian Chmiel
Ron Banner
Elad Hoffer
Hilla Ben Yaacov
Daniel Soudry
MQ
209
25
0
19 Dec 2021
EDEN: Communication-Efficient and Robust Distributed Mean Estimation for Federated Learning
S. Vargaftik
Ran Ben-Basat
Amit Portnoy
Gal Mendelson
Y. Ben-Itzhak
Michael Mitzenmacher
FedML
172
50
0
19 Aug 2021
DRIVE: One-bit Distributed Mean Estimation
S. Vargaftik
Ran Ben-Basat
Amit Portnoy
Gal Mendelson
Y. Ben-Itzhak
Michael Mitzenmacher
OOD
FedML
289
55
0
18 May 2021
Documenting Large Webtext Corpora: A Case Study on the Colossal Clean Crawled Corpus
Jesse Dodge
Maarten Sap
Ana Marasović
William Agnew
Gabriel Ilharco
Dirk Groeneveld
Margaret Mitchell
Matt Gardner
AILaw
185
493
0
18 Apr 2021
LSQ+: Improving low-bit quantization through learnable offsets and better initialization
Yash Bhalgat
Jinwon Lee
Markus Nagel
Tijmen Blankevoort
Nojun Kwak
MQ
120
242
0
20 Apr 2020
Scaling Laws for Neural Language Models
Jared Kaplan
Sam McCandlish
T. Henighan
Tom B. Brown
B. Chess
R. Child
Scott Gray
Alec Radford
Jeff Wu
Dario Amodei
813
5,524
0
23 Jan 2020
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer
Colin Raffel
Noam M. Shazeer
Adam Roberts
Katherine Lee
Sharan Narang
Michael Matena
Yanqi Zhou
Wei Li
Peter J. Liu
AIMat
1.2K
21,535
0
23 Oct 2019
Training High-Performance and Large-Scale Deep Neural Networks with Full 8-bit Integers
Yukuan Yang
Shuang Wu
Lei Deng
Tianyi Yan
Yuan Xie
Guoqi Li
MQ
232
114
0
05 Sep 2019
Learned Step Size Quantization
S. K. Esser
J. McKinstry
Deepika Bablani
R. Appuswamy
D. Modha
MQ
206
855
0
21 Feb 2019
Scalable Methods for 8-bit Training of Neural Networks
Ron Banner
Itay Hubara
Elad Hoffer
Daniel Soudry
MQ
186
349
0
25 May 2018
PACT: Parameterized Clipping Activation for Quantized Neural Networks
Jungwook Choi
Zhuo Wang
Swagath Venkataramani
P. Chuang
Vijayalakshmi Srinivasan
K. Gopalakrishnan
MQ
184
995
0
16 May 2018
Mixed Precision Training
Paulius Micikevicius
Sharan Narang
Jonah Alben
G. Diamos
Erich Elsen
...
Boris Ginsburg
Michael Houston
Oleksii Kuchaiev
Ganesh Venkatesh
Hao Wu
284
1,916
0
10 Oct 2017
Attention Is All You Need
Ashish Vaswani
Noam M. Shazeer
Niki Parmar
Jakob Uszkoreit
Llion Jones
Aidan Gomez
Lukasz Kaiser
Illia Polosukhin
3DV
1.6K
141,311
0
12 Jun 2017
Distributed Mean Estimation with Limited Communication
A. Suresh
Felix X. Yu
Sanjiv Kumar
H. B. McMahan
FedML
233
372
0
02 Nov 2016
Estimating or Propagating Gradients Through Stochastic Neurons for Conditional Computation
Yoshua Bengio
Nicholas Léonard
Aaron Courville
688
3,312
0
15 Aug 2013
1