Training Deep Neural Networks with 8-bit Floating Point Numbers

Training Deep Neural Networks with 8-bit Floating Point Numbers

    MQ

Papers citing "Training Deep Neural Networks with 8-bit Floating Point Numbers"

50 / 212 papers shown
Title
Stochastic Rounding for LLM Training: Theory and Practice
Stochastic Rounding for LLM Training: Theory and PracticeInternational Conference on Artificial Intelligence and Statistics (AISTATS), 2025
129
2
0
27 Feb 2025
S-STE: Continuous Pruning Function for Efficient 2:4 Sparse Pre-training
S-STE: Continuous Pruning Function for Efficient 2:4 Sparse Pre-trainingNeural Information Processing Systems (NeurIPS), 2024
311
5
0
13 Sep 2024
Scalify: scale propagation for efficient low-precision LLM training
Scalify: scale propagation for efficient low-precision LLM training
Paul Balança
Sam Hosegood
Carlo Luschi
Andrew Fitzgibbon
115
4
0
24 Jul 2024