ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2207.08799
  4. Cited By
Hidden Progress in Deep Learning: SGD Learns Parities Near the
  Computational Limit

Hidden Progress in Deep Learning: SGD Learns Parities Near the Computational Limit

18 July 2022
Boaz Barak
Benjamin L. Edelman
Surbhi Goel
Sham Kakade
Eran Malach
Cyril Zhang
ArXivPDFHTML

Papers citing "Hidden Progress in Deep Learning: SGD Learns Parities Near the Computational Limit"

50 / 108 papers shown
Title
Feature emergence via margin maximization: case studies in algebraic
  tasks
Feature emergence via margin maximization: case studies in algebraic tasks
Depen Morwani
Benjamin L. Edelman
Costin-Andrei Oncescu
Rosie Zhao
Sham Kakade
23
7
0
13 Nov 2023
Understanding Grokking Through A Robustness Viewpoint
Understanding Grokking Through A Robustness Viewpoint
Zhiquan Tan
Weiran Huang
AAML
OOD
25
6
0
11 Nov 2023
Outliers with Opposing Signals Have an Outsized Effect on Neural Network
  Optimization
Outliers with Opposing Signals Have an Outsized Effect on Neural Network Optimization
Elan Rosenfeld
Andrej Risteski
14
10
0
07 Nov 2023
In-Context Learning Dynamics with Random Binary Sequences
In-Context Learning Dynamics with Random Binary Sequences
Eric J. Bigelow
Ekdeep Singh Lubana
Robert P. Dick
Hidenori Tanaka
T. Ullman
16
4
0
26 Oct 2023
Grokking Beyond Neural Networks: An Empirical Exploration with Model
  Complexity
Grokking Beyond Neural Networks: An Empirical Exploration with Model Complexity
Jack Miller
Charles OÑeill
Thang Bui
11
9
0
26 Oct 2023
To grok or not to grok: Disentangling generalization and memorization on
  corrupted algorithmic datasets
To grok or not to grok: Disentangling generalization and memorization on corrupted algorithmic datasets
Darshil Doshi
Aritra Das
Tianyu He
Andrey Gromov
OOD
24
5
0
19 Oct 2023
Eureka-Moments in Transformers: Multi-Step Tasks Reveal Softmax Induced
  Optimization Problems
Eureka-Moments in Transformers: Multi-Step Tasks Reveal Softmax Induced Optimization Problems
David T. Hoffmann
Simon Schrodi
Jelena Bratulić
Nadine Behrmann
Volker Fischer
Thomas Brox
19
3
0
19 Oct 2023
Butterfly Effects of SGD Noise: Error Amplification in Behavior Cloning
  and Autoregression
Butterfly Effects of SGD Noise: Error Amplification in Behavior Cloning and Autoregression
Adam Block
Dylan J. Foster
Akshay Krishnamurthy
Max Simchowitz
Cyril Zhang
15
1
0
17 Oct 2023
Compositional Abilities Emerge Multiplicatively: Exploring Diffusion
  Models on a Synthetic Task
Compositional Abilities Emerge Multiplicatively: Exploring Diffusion Models on a Synthetic Task
Maya Okawa
Ekdeep Singh Lubana
Robert P. Dick
Hidenori Tanaka
CoGe
DiffM
35
43
0
13 Oct 2023
An Adversarial Example for Direct Logit Attribution: Memory Management
  in gelu-4l
An Adversarial Example for Direct Logit Attribution: Memory Management in gelu-4l
James Dao
Yeu-Tong Lau
Can Rager
Jett Janiak
27
5
0
11 Oct 2023
Dynamical versus Bayesian Phase Transitions in a Toy Model of
  Superposition
Dynamical versus Bayesian Phase Transitions in a Toy Model of Superposition
Zhongtian Chen
Edmund Lau
Jake Mendel
Susan Wei
Daniel Murfet
9
13
0
10 Oct 2023
Grokking as the Transition from Lazy to Rich Training Dynamics
Grokking as the Transition from Lazy to Rich Training Dynamics
Tanishq Kumar
Blake Bordelon
Samuel Gershman
C. Pehlevan
17
31
0
09 Oct 2023
Grokking as Compression: A Nonlinear Complexity Perspective
Grokking as Compression: A Nonlinear Complexity Perspective
Ziming Liu
Ziqian Zhong
Max Tegmark
12
9
0
09 Oct 2023
Optimizing Solution-Samplers for Combinatorial Problems: The Landscape
  of Policy-Gradient Methods
Optimizing Solution-Samplers for Combinatorial Problems: The Landscape of Policy-Gradient Methods
C. Caramanis
Dimitris Fotakis
Alkis Kalavasis
Vasilis Kontonis
Christos Tzamos
8
5
0
08 Oct 2023
Benign Overfitting and Grokking in ReLU Networks for XOR Cluster Data
Benign Overfitting and Grokking in ReLU Networks for XOR Cluster Data
Zhiwei Xu
Yutong Wang
Spencer Frei
Gal Vardi
Wei Hu
MLT
11
23
0
04 Oct 2023
JoMA: Demystifying Multilayer Transformers via JOint Dynamics of MLP and
  Attention
JoMA: Demystifying Multilayer Transformers via JOint Dynamics of MLP and Attention
Yuandong Tian
Yiping Wang
Zhenyu (Allen) Zhang
Beidi Chen
Simon S. Du
16
35
0
01 Oct 2023
Towards Best Practices of Activation Patching in Language Models:
  Metrics and Methods
Towards Best Practices of Activation Patching in Language Models: Metrics and Methods
Fred Zhang
Neel Nanda
LLMSV
21
95
0
27 Sep 2023
SGD Finds then Tunes Features in Two-Layer Neural Networks with
  near-Optimal Sample Complexity: A Case Study in the XOR problem
SGD Finds then Tunes Features in Two-Layer Neural Networks with near-Optimal Sample Complexity: A Case Study in the XOR problem
Margalit Glasgow
MLT
67
13
0
26 Sep 2023
Sudden Drops in the Loss: Syntax Acquisition, Phase Transitions, and
  Simplicity Bias in MLMs
Sudden Drops in the Loss: Syntax Acquisition, Phase Transitions, and Simplicity Bias in MLMs
Angelica Chen
Ravid Schwartz-Ziv
Kyunghyun Cho
Matthew L. Leavitt
Naomi Saphra
17
61
0
13 Sep 2023
Pretraining on the Test Set Is All You Need
Pretraining on the Test Set Is All You Need
Rylan Schaeffer
6
29
0
13 Sep 2023
Gradient-Based Feature Learning under Structured Data
Gradient-Based Feature Learning under Structured Data
Alireza Mousavi-Hosseini
Denny Wu
Taiji Suzuki
Murat A. Erdogdu
MLT
21
18
0
07 Sep 2023
Pareto Frontiers in Neural Feature Learning: Data, Compute, Width, and
  Luck
Pareto Frontiers in Neural Feature Learning: Data, Compute, Width, and Luck
Benjamin L. Edelman
Surbhi Goel
Sham Kakade
Eran Malach
Cyril Zhang
43
7
0
07 Sep 2023
Explaining grokking through circuit efficiency
Explaining grokking through circuit efficiency
Vikrant Varma
Rohin Shah
Zachary Kenton
János Kramár
Ramana Kumar
8
47
0
05 Sep 2023
Latent State Models of Training Dynamics
Latent State Models of Training Dynamics
Michael Y. Hu
Angelica Chen
Naomi Saphra
Kyunghyun Cho
25
4
0
18 Aug 2023
On Single Index Models beyond Gaussian Data
On Single Index Models beyond Gaussian Data
Joan Bruna
Loucas Pillaud-Vivien
Aaron Zweig
8
10
0
28 Jul 2023
The semantic landscape paradigm for neural networks
The semantic landscape paradigm for neural networks
Shreyas Gokhale
11
2
0
18 Jul 2023
Provable Multi-Task Representation Learning by Two-Layer ReLU Neural
  Networks
Provable Multi-Task Representation Learning by Two-Layer ReLU Neural Networks
Liam Collins
Hamed Hassani
Mahdi Soltanolkotabi
Aryan Mokhtari
Sanjay Shakkottai
10
10
0
13 Jul 2023
Large Language Models
Large Language Models
Michael R Douglas
LLMAG
LM&MA
22
547
0
11 Jul 2023
How Deep Neural Networks Learn Compositional Data: The Random Hierarchy
  Model
How Deep Neural Networks Learn Compositional Data: The Random Hierarchy Model
Francesco Cagnetta
Leonardo Petrini
Umberto M. Tomasini
Alessandro Favero
M. Wyart
BDL
11
22
0
05 Jul 2023
The Clock and the Pizza: Two Stories in Mechanistic Explanation of
  Neural Networks
The Clock and the Pizza: Two Stories in Mechanistic Explanation of Neural Networks
Ziqian Zhong
Ziming Liu
Max Tegmark
Jacob Andreas
15
59
0
30 Jun 2023
Provable Advantage of Curriculum Learning on Parity Targets with Mixed
  Inputs
Provable Advantage of Curriculum Learning on Parity Targets with Mixed Inputs
Emmanuel Abbe
Elisabetta Cornacchia
Aryo Lotfi
8
11
0
29 Jun 2023
Exposing Attention Glitches with Flip-Flop Language Modeling
Exposing Attention Glitches with Flip-Flop Language Modeling
Bingbin Liu
Jordan T. Ash
Surbhi Goel
A. Krishnamurthy
Cyril Zhang
LRM
21
46
0
01 Jun 2023
Scan and Snap: Understanding Training Dynamics and Token Composition in
  1-layer Transformer
Scan and Snap: Understanding Training Dynamics and Token Composition in 1-layer Transformer
Yuandong Tian
Yiping Wang
Beidi Chen
S. Du
MLT
12
70
0
25 May 2023
Provable Guarantees for Nonlinear Feature Learning in Three-Layer Neural Networks
Provable Guarantees for Nonlinear Feature Learning in Three-Layer Neural Networks
Eshaan Nichani
Alexandru Damian
Jason D. Lee
MLT
31
13
0
11 May 2023
Are Emergent Abilities of Large Language Models a Mirage?
Are Emergent Abilities of Large Language Models a Mirage?
Rylan Schaeffer
Brando Miranda
Oluwasanmi Koyejo
LRM
28
262
0
28 Apr 2023
An Over-parameterized Exponential Regression
An Over-parameterized Exponential Regression
Yeqi Gao
Sridhar Mahadevan
Zhao-quan Song
8
35
0
29 Mar 2023
The Quantization Model of Neural Scaling
The Quantization Model of Neural Scaling
Eric J. Michaud
Ziming Liu
Uzay Girit
Max Tegmark
MILM
14
77
0
23 Mar 2023
A Tale of Two Circuits: Grokking as Competition of Sparse and Dense
  Subnetworks
A Tale of Two Circuits: Grokking as Competition of Sparse and Dense Subnetworks
William Merrill
Nikolaos Tsilivis
Aman Shukla
8
39
0
21 Mar 2023
Practically Solving LPN in High Noise Regimes Faster Using Neural
  Networks
Practically Solving LPN in High Noise Regimes Faster Using Neural Networks
Haozhe Jiang
Kaiyue Wen
Yi-Long Chen
20
0
0
14 Mar 2023
Learning time-scales in two-layers neural networks
Learning time-scales in two-layers neural networks
Raphael Berthier
Andrea Montanari
Kangjie Zhou
22
33
0
28 Feb 2023
SGD learning on neural networks: leap complexity and saddle-to-saddle
  dynamics
SGD learning on neural networks: leap complexity and saddle-to-saddle dynamics
Emmanuel Abbe
Enric Boix-Adserà
Theodor Misiakiewicz
FedML
MLT
76
72
0
21 Feb 2023
A Toy Model of Universality: Reverse Engineering How Networks Learn
  Group Operations
A Toy Model of Universality: Reverse Engineering How Networks Learn Group Operations
Bilal Chughtai
Lawrence Chan
Neel Nanda
8
96
0
06 Feb 2023
A Mathematical Model for Curriculum Learning for Parities
A Mathematical Model for Curriculum Learning for Parities
Elisabetta Cornacchia
Elchanan Mossel
12
10
0
31 Jan 2023
Progress measures for grokking via mechanistic interpretability
Progress measures for grokking via mechanistic interpretability
Neel Nanda
Lawrence Chan
Tom Lieberum
Jess Smith
Jacob Steinhardt
26
378
0
12 Jan 2023
Grokking modular arithmetic
Grokking modular arithmetic
Andrey Gromov
27
37
0
06 Jan 2023
Can Large Language Models Change User Preference Adversarially?
Can Large Language Models Change User Preference Adversarially?
Varshini Subhash
AAML
16
8
0
05 Jan 2023
Problem-Dependent Power of Quantum Neural Networks on Multi-Class
  Classification
Problem-Dependent Power of Quantum Neural Networks on Multi-Class Classification
Yuxuan Du
Yibo Yang
Dacheng Tao
Min-hsiu Hsieh
23
22
0
29 Dec 2022
Simplicity Bias in Transformers and their Ability to Learn Sparse
  Boolean Functions
Simplicity Bias in Transformers and their Ability to Learn Sparse Boolean Functions
S. Bhattamishra
Arkil Patel
Varun Kanade
Phil Blunsom
6
43
0
22 Nov 2022
Convexifying Transformers: Improving optimization and understanding of
  transformer networks
Convexifying Transformers: Improving optimization and understanding of transformer networks
Tolga Ergen
Behnam Neyshabur
Harsh Mehta
MLT
29
15
0
20 Nov 2022
Interpretability in the Wild: a Circuit for Indirect Object
  Identification in GPT-2 small
Interpretability in the Wild: a Circuit for Indirect Object Identification in GPT-2 small
Kevin Wang
Alexandre Variengien
Arthur Conmy
Buck Shlegeris
Jacob Steinhardt
210
486
0
01 Nov 2022
Previous
123
Next