To FP8 and Back Again: Quantifying Reduced Precision Effects on LLM Training Stability

29 May 2024

Papers citing "To FP8 and Back Again: Quantifying Reduced Precision Effects on LLM Training Stability"

2 / 2 papers shown

Title
Stable-SPAM: How to Train in 4-Bit More Stably than 16-Bit Adam Tianjin Huang Haotian Hu Zhenyu (Allen) Zhang Gaojie Jin X. Li ... Tianlong Chen Lu Liu Qingsong Wen Zhangyang Wang Shiwei Liu MQ 33 0 0 24 Feb 2025
On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima N. Keskar Dheevatsa Mudigere J. Nocedal M. Smelyanskiy P. T. P. Tang ODL 273 2,878 0 15 Sep 2016