Sophia: A Scalable Stochastic Second-order Optimizer for Language Model
Pre-trainingInternational Conference on Learning Representations (ICLR), 2023 |
Sketchy: Memory-efficient Adaptive Regularization with Frequent
DirectionsNeural Information Processing Systems (NeurIPS), 2023 |
Generalisation under gradient descent via deterministic PAC-BayesInternational Conference on Algorithmic Learning Theory (ALT), 2022 |
Beyond accuracy: generalization properties of bio-plausible temporal
credit assignment rulesNeural Information Processing Systems (NeurIPS), 2022 |
Neuronal diversity can improve machine learning for physics and beyondScientific Reports (Sci Rep), 2022 |
When Do Flat Minima Optimizers Work?Neural Information Processing Systems (NeurIPS), 2022 |
Hessian Eigenspectra of More Realistic Nonlinear ModelsNeural Information Processing Systems (NeurIPS), 2021 |