Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2207.08799
Cited By
Hidden Progress in Deep Learning: SGD Learns Parities Near the Computational Limit
18 July 2022
Boaz Barak
Benjamin L. Edelman
Surbhi Goel
Sham Kakade
Eran Malach
Cyril Zhang
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Hidden Progress in Deep Learning: SGD Learns Parities Near the Computational Limit"
50 / 108 papers shown
Title
Feature emergence via margin maximization: case studies in algebraic tasks
Depen Morwani
Benjamin L. Edelman
Costin-Andrei Oncescu
Rosie Zhao
Sham Kakade
23
7
0
13 Nov 2023
Understanding Grokking Through A Robustness Viewpoint
Zhiquan Tan
Weiran Huang
AAML
OOD
25
6
0
11 Nov 2023
Outliers with Opposing Signals Have an Outsized Effect on Neural Network Optimization
Elan Rosenfeld
Andrej Risteski
14
10
0
07 Nov 2023
In-Context Learning Dynamics with Random Binary Sequences
Eric J. Bigelow
Ekdeep Singh Lubana
Robert P. Dick
Hidenori Tanaka
T. Ullman
16
4
0
26 Oct 2023
Grokking Beyond Neural Networks: An Empirical Exploration with Model Complexity
Jack Miller
Charles OÑeill
Thang Bui
11
9
0
26 Oct 2023
To grok or not to grok: Disentangling generalization and memorization on corrupted algorithmic datasets
Darshil Doshi
Aritra Das
Tianyu He
Andrey Gromov
OOD
24
5
0
19 Oct 2023
Eureka-Moments in Transformers: Multi-Step Tasks Reveal Softmax Induced Optimization Problems
David T. Hoffmann
Simon Schrodi
Jelena Bratulić
Nadine Behrmann
Volker Fischer
Thomas Brox
19
3
0
19 Oct 2023
Butterfly Effects of SGD Noise: Error Amplification in Behavior Cloning and Autoregression
Adam Block
Dylan J. Foster
Akshay Krishnamurthy
Max Simchowitz
Cyril Zhang
15
1
0
17 Oct 2023
Compositional Abilities Emerge Multiplicatively: Exploring Diffusion Models on a Synthetic Task
Maya Okawa
Ekdeep Singh Lubana
Robert P. Dick
Hidenori Tanaka
CoGe
DiffM
35
43
0
13 Oct 2023
An Adversarial Example for Direct Logit Attribution: Memory Management in gelu-4l
James Dao
Yeu-Tong Lau
Can Rager
Jett Janiak
27
5
0
11 Oct 2023
Dynamical versus Bayesian Phase Transitions in a Toy Model of Superposition
Zhongtian Chen
Edmund Lau
Jake Mendel
Susan Wei
Daniel Murfet
9
13
0
10 Oct 2023
Grokking as the Transition from Lazy to Rich Training Dynamics
Tanishq Kumar
Blake Bordelon
Samuel Gershman
C. Pehlevan
17
31
0
09 Oct 2023
Grokking as Compression: A Nonlinear Complexity Perspective
Ziming Liu
Ziqian Zhong
Max Tegmark
12
9
0
09 Oct 2023
Optimizing Solution-Samplers for Combinatorial Problems: The Landscape of Policy-Gradient Methods
C. Caramanis
Dimitris Fotakis
Alkis Kalavasis
Vasilis Kontonis
Christos Tzamos
8
5
0
08 Oct 2023
Benign Overfitting and Grokking in ReLU Networks for XOR Cluster Data
Zhiwei Xu
Yutong Wang
Spencer Frei
Gal Vardi
Wei Hu
MLT
11
23
0
04 Oct 2023
JoMA: Demystifying Multilayer Transformers via JOint Dynamics of MLP and Attention
Yuandong Tian
Yiping Wang
Zhenyu (Allen) Zhang
Beidi Chen
Simon S. Du
16
35
0
01 Oct 2023
Towards Best Practices of Activation Patching in Language Models: Metrics and Methods
Fred Zhang
Neel Nanda
LLMSV
21
95
0
27 Sep 2023
SGD Finds then Tunes Features in Two-Layer Neural Networks with near-Optimal Sample Complexity: A Case Study in the XOR problem
Margalit Glasgow
MLT
67
13
0
26 Sep 2023
Sudden Drops in the Loss: Syntax Acquisition, Phase Transitions, and Simplicity Bias in MLMs
Angelica Chen
Ravid Schwartz-Ziv
Kyunghyun Cho
Matthew L. Leavitt
Naomi Saphra
17
61
0
13 Sep 2023
Pretraining on the Test Set Is All You Need
Rylan Schaeffer
6
29
0
13 Sep 2023
Gradient-Based Feature Learning under Structured Data
Alireza Mousavi-Hosseini
Denny Wu
Taiji Suzuki
Murat A. Erdogdu
MLT
21
18
0
07 Sep 2023
Pareto Frontiers in Neural Feature Learning: Data, Compute, Width, and Luck
Benjamin L. Edelman
Surbhi Goel
Sham Kakade
Eran Malach
Cyril Zhang
43
7
0
07 Sep 2023
Explaining grokking through circuit efficiency
Vikrant Varma
Rohin Shah
Zachary Kenton
János Kramár
Ramana Kumar
8
47
0
05 Sep 2023
Latent State Models of Training Dynamics
Michael Y. Hu
Angelica Chen
Naomi Saphra
Kyunghyun Cho
25
4
0
18 Aug 2023
On Single Index Models beyond Gaussian Data
Joan Bruna
Loucas Pillaud-Vivien
Aaron Zweig
8
10
0
28 Jul 2023
The semantic landscape paradigm for neural networks
Shreyas Gokhale
11
2
0
18 Jul 2023
Provable Multi-Task Representation Learning by Two-Layer ReLU Neural Networks
Liam Collins
Hamed Hassani
Mahdi Soltanolkotabi
Aryan Mokhtari
Sanjay Shakkottai
10
10
0
13 Jul 2023
Large Language Models
Michael R Douglas
LLMAG
LM&MA
22
547
0
11 Jul 2023
How Deep Neural Networks Learn Compositional Data: The Random Hierarchy Model
Francesco Cagnetta
Leonardo Petrini
Umberto M. Tomasini
Alessandro Favero
M. Wyart
BDL
11
22
0
05 Jul 2023
The Clock and the Pizza: Two Stories in Mechanistic Explanation of Neural Networks
Ziqian Zhong
Ziming Liu
Max Tegmark
Jacob Andreas
15
59
0
30 Jun 2023
Provable Advantage of Curriculum Learning on Parity Targets with Mixed Inputs
Emmanuel Abbe
Elisabetta Cornacchia
Aryo Lotfi
8
11
0
29 Jun 2023
Exposing Attention Glitches with Flip-Flop Language Modeling
Bingbin Liu
Jordan T. Ash
Surbhi Goel
A. Krishnamurthy
Cyril Zhang
LRM
21
46
0
01 Jun 2023
Scan and Snap: Understanding Training Dynamics and Token Composition in 1-layer Transformer
Yuandong Tian
Yiping Wang
Beidi Chen
S. Du
MLT
12
70
0
25 May 2023
Provable Guarantees for Nonlinear Feature Learning in Three-Layer Neural Networks
Eshaan Nichani
Alexandru Damian
Jason D. Lee
MLT
31
13
0
11 May 2023
Are Emergent Abilities of Large Language Models a Mirage?
Rylan Schaeffer
Brando Miranda
Oluwasanmi Koyejo
LRM
28
262
0
28 Apr 2023
An Over-parameterized Exponential Regression
Yeqi Gao
Sridhar Mahadevan
Zhao-quan Song
8
35
0
29 Mar 2023
The Quantization Model of Neural Scaling
Eric J. Michaud
Ziming Liu
Uzay Girit
Max Tegmark
MILM
14
77
0
23 Mar 2023
A Tale of Two Circuits: Grokking as Competition of Sparse and Dense Subnetworks
William Merrill
Nikolaos Tsilivis
Aman Shukla
8
39
0
21 Mar 2023
Practically Solving LPN in High Noise Regimes Faster Using Neural Networks
Haozhe Jiang
Kaiyue Wen
Yi-Long Chen
20
0
0
14 Mar 2023
Learning time-scales in two-layers neural networks
Raphael Berthier
Andrea Montanari
Kangjie Zhou
22
33
0
28 Feb 2023
SGD learning on neural networks: leap complexity and saddle-to-saddle dynamics
Emmanuel Abbe
Enric Boix-Adserà
Theodor Misiakiewicz
FedML
MLT
76
72
0
21 Feb 2023
A Toy Model of Universality: Reverse Engineering How Networks Learn Group Operations
Bilal Chughtai
Lawrence Chan
Neel Nanda
8
96
0
06 Feb 2023
A Mathematical Model for Curriculum Learning for Parities
Elisabetta Cornacchia
Elchanan Mossel
12
10
0
31 Jan 2023
Progress measures for grokking via mechanistic interpretability
Neel Nanda
Lawrence Chan
Tom Lieberum
Jess Smith
Jacob Steinhardt
26
378
0
12 Jan 2023
Grokking modular arithmetic
Andrey Gromov
27
37
0
06 Jan 2023
Can Large Language Models Change User Preference Adversarially?
Varshini Subhash
AAML
16
8
0
05 Jan 2023
Problem-Dependent Power of Quantum Neural Networks on Multi-Class Classification
Yuxuan Du
Yibo Yang
Dacheng Tao
Min-hsiu Hsieh
23
22
0
29 Dec 2022
Simplicity Bias in Transformers and their Ability to Learn Sparse Boolean Functions
S. Bhattamishra
Arkil Patel
Varun Kanade
Phil Blunsom
6
43
0
22 Nov 2022
Convexifying Transformers: Improving optimization and understanding of transformer networks
Tolga Ergen
Behnam Neyshabur
Harsh Mehta
MLT
29
15
0
20 Nov 2022
Interpretability in the Wild: a Circuit for Indirect Object Identification in GPT-2 small
Kevin Wang
Alexandre Variengien
Arthur Conmy
Buck Shlegeris
Jacob Steinhardt
210
486
0
01 Nov 2022
Previous
1
2
3
Next