ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2309.03004
  4. Cited By
A Theoretical Explanation of Activation Sparsity through Flat Minima and
  Adversarial Robustness

A Theoretical Explanation of Activation Sparsity through Flat Minima and Adversarial Robustness

6 September 2023
Ze Peng
Lei Qi
Yinghuan Shi
Yang Gao
ArXivPDFHTML

Papers citing "A Theoretical Explanation of Activation Sparsity through Flat Minima and Adversarial Robustness"

7 / 7 papers shown
Title
Learning Neural Networks with Sparse Activations
Learning Neural Networks with Sparse Activations
Pranjal Awasthi
Nishanth Dikkala
Pritish Kamath
Raghu Meka
21
2
0
26 Jun 2024
Progress Measures for Grokking on Real-world Tasks
Progress Measures for Grokking on Real-world Tasks
Satvik Golechha
29
1
0
21 May 2024
Where We Have Arrived in Proving the Emergence of Sparse Symbolic
  Concepts in AI Models
Where We Have Arrived in Proving the Emergence of Sparse Symbolic Concepts in AI Models
Qihan Ren
Maximilian Brunner
Wen Shen
S. Mintchev
26
11
0
03 May 2023
Primer: Searching for Efficient Transformers for Language Modeling
Primer: Searching for Efficient Transformers for Language Modeling
David R. So
Wojciech Mañke
Hanxiao Liu
Zihang Dai
Noam M. Shazeer
Quoc V. Le
VLM
83
151
0
17 Sep 2021
MLP-Mixer: An all-MLP Architecture for Vision
MLP-Mixer: An all-MLP Architecture for Vision
Ilya O. Tolstikhin
N. Houlsby
Alexander Kolesnikov
Lucas Beyer
Xiaohua Zhai
...
Andreas Steiner
Daniel Keysers
Jakob Uszkoreit
Mario Lucic
Alexey Dosovitskiy
239
2,600
0
04 May 2021
Scaling Laws for Neural Language Models
Scaling Laws for Neural Language Models
Jared Kaplan
Sam McCandlish
T. Henighan
Tom B. Brown
B. Chess
R. Child
Scott Gray
Alec Radford
Jeff Wu
Dario Amodei
226
4,453
0
23 Jan 2020
On Large-Batch Training for Deep Learning: Generalization Gap and Sharp
  Minima
On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima
N. Keskar
Dheevatsa Mudigere
J. Nocedal
M. Smelyanskiy
P. T. P. Tang
ODL
273
2,886
0
15 Sep 2016
1