A Theoretical Explanation of Activation Sparsity through Flat Minima and
Adversarial Robustness

A Theoretical Explanation of Activation Sparsity through Flat Minima and Adversarial Robustness

6 September 2023

Lei Qi

Papers citing "A Theoretical Explanation of Activation Sparsity through Flat Minima and Adversarial Robustness"

7 / 7 papers shown

Title
Learning Neural Networks with Sparse Activations Pranjal Awasthi Nishanth Dikkala Pritish Kamath Raghu Meka 21 2 0 26 Jun 2024
Progress Measures for Grokking on Real-world Tasks Satvik Golechha 29 1 0 21 May 2024
Where We Have Arrived in Proving the Emergence of Sparse Symbolic Concepts in AI Models Qihan Ren Maximilian Brunner Wen Shen S. Mintchev 26 11 0 03 May 2023
Primer: Searching for Efficient Transformers for Language Modeling David R. So Wojciech Mañke Hanxiao Liu Zihang Dai Noam M. Shazeer Quoc V. Le VLM 83 151 0 17 Sep 2021
MLP-Mixer: An all-MLP Architecture for Vision Ilya O. Tolstikhin N. Houlsby Alexander Kolesnikov Lucas Beyer Xiaohua Zhai ... Andreas Steiner Daniel Keysers Jakob Uszkoreit Mario Lucic Alexey Dosovitskiy 239 2,600 0 04 May 2021
Scaling Laws for Neural Language Models Jared Kaplan Sam McCandlish T. Henighan Tom B. Brown B. Chess R. Child Scott Gray Alec Radford Jeff Wu Dario Amodei 226 4,453 0 23 Jan 2020
On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima N. Keskar Dheevatsa Mudigere J. Nocedal M. Smelyanskiy P. T. P. Tang ODL 273 2,886 0 15 Sep 2016