Just One Byte (per gradient): A Note on Low-Bandwidth Decentralized Language Model Finetuning Using Shared Randomness

16 June 2023

Papers citing "Just One Byte (per gradient): A Note on Low-Bandwidth Decentralized Language Model Finetuning Using Shared Randomness"

4 / 4 papers shown

Title
An Empirical Comparison of Optimizers for Quantum Machine Learning with SPSA-based Gradients Marco Wiedmann Marc Hölle Maniraman Periyasamy Nico Meyer Christian Ufrecht Daniel D. Scherer Axel Plinge Christopher Mutschler 49 13 0 27 Apr 2023
Sparse Random Networks for Communication-Efficient Federated Learning Berivan Isik Francesco Pase Deniz Gunduz Tsachy Weissman M. Zorzi FedML 44 34 0 30 Sep 2022
On the advantages of stochastic encoders Lucas Theis E. Agustsson 28 35 0 18 Feb 2021
Scaling Laws for Neural Language Models Jared Kaplan Sam McCandlish T. Henighan Tom B. Brown B. Chess R. Child Scott Gray Alec Radford Jeff Wu Dario Amodei 212 3,054 0 23 Jan 2020