Should Under-parameterized Student Networks Copy or Average Teacher Weights?

3 November 2023

Papers citing "Should Under-parameterized Student Networks Copy or Average Teacher Weights?"

5 / 5 papers shown

Title
Learning Gaussian Multi-Index Models with Gradient Flow: Time Complexity and Directional Convergence Berfin Simsek Amire Bendjeddou Daniel Hsu 32 0 0 13 Nov 2024
Loss Landscape of Shallow ReLU-like Neural Networks: Stationary Points, Saddle Escape, and Network Embedding Zhengqing Wu Berfin Simsek Francois Ged ODL 30 0 0 08 Feb 2024
Learning Single-Index Models with Shallow Neural Networks A. Bietti Joan Bruna Clayton Sanford M. Song 155 65 0 27 Oct 2022
Neural Networks Efficiently Learn Low-Dimensional Representations with SGD Alireza Mousavi-Hosseini Sejun Park M. Girotti Ioannis Mitliagkas Murat A. Erdogdu MLT 313 48 0 29 Sep 2022
Toy Models of Superposition Nelson Elhage Tristan Hume Catherine Olsson Nicholas Schiefer T. Henighan ... Sam McCandlish Jared Kaplan Dario Amodei Martin Wattenberg C. Olah AAML MILM 120 314 0 21 Sep 2022