Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2207.08799
Cited By
Hidden Progress in Deep Learning: SGD Learns Parities Near the Computational Limit
18 July 2022
Boaz Barak
Benjamin L. Edelman
Surbhi Goel
Sham Kakade
Eran Malach
Cyril Zhang
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Hidden Progress in Deep Learning: SGD Learns Parities Near the Computational Limit"
50 / 108 papers shown
Title
New Statistical and Computational Results for Learning Junta Distributions
Lorenzo Beretta
16
0
0
09 May 2025
Quiet Feature Learning in Algorithmic Tasks
Prudhviraj Naidu
Zixian Wang
Leon Bergen
R. Paturi
VLM
44
0
0
06 May 2025
Let Me Grok for You: Accelerating Grokking via Embedding Transfer from a Weaker Model
Zhiwei Xu
Zhiyu Ni
Yixin Wang
Wei Hu
CLL
22
0
0
17 Apr 2025
A Two-Phase Perspective on Deep Learning Dynamics
Robert de Mello Koch
Animik Ghosh
24
0
0
17 Apr 2025
Beyond the Next Token: Towards Prompt-Robust Zero-Shot Classification via Efficient Multi-Token Prediction
Junlang Qian
Zixiao Zhu
Hanzhang Zhou
Zijian Feng
Zepeng Zhai
K. Mao
AAML
VLM
33
0
0
04 Apr 2025
Efficient Knowledge Distillation via Curriculum Extraction
Shivam Gupta
Sushrut Karmalkar
37
0
0
21 Mar 2025
Low-dimensional Functions are Efficiently Learnable under Randomly Biased Distributions
Elisabetta Cornacchia
Dan Mikulincer
Elchanan Mossel
49
0
0
10 Feb 2025
Explaining Context Length Scaling and Bounds for Language Models
Jingzhe Shi
Qinwei Ma
Hongyi Liu
Hang Zhao
Jeng-Neng Hwang
Serge Belongie
Lei Li
LRM
62
2
0
03 Feb 2025
An Attempt to Unraveling Token Prediction Refinement and Identifying Essential Layers of Large Language Models
Jaturong Kongmanee
29
1
0
28 Jan 2025
Grokking at the Edge of Numerical Stability
Lucas Prieto
Melih Barsbey
Pedro A.M. Mediano
Tolga Birdal
32
3
0
08 Jan 2025
Exploring Grokking: Experimental and Mechanistic Investigations
Hu Qiye
Zhou Hao
Yu RuoXi
66
1
0
14 Dec 2024
Learning Gaussian Multi-Index Models with Gradient Flow: Time Complexity and Directional Convergence
Berfin Simsek
Amire Bendjeddou
Daniel Hsu
32
0
0
13 Nov 2024
Scaling Laws for Precision
Tanishq Kumar
Zachary Ankner
Benjamin Spector
Blake Bordelon
Niklas Muennighoff
Mansheej Paul
C. Pehlevan
Christopher Ré
Aditi Raghunathan
AIFin
MoMe
38
12
0
07 Nov 2024
Do Mice Grok? Glimpses of Hidden Progress During Overtraining in Sensory Cortex
Tanishq Kumar
Blake Bordelon
C. Pehlevan
Venkatesh N. Murthy
Samuel Gershman
OOD
CLL
SSL
35
0
0
05 Nov 2024
Pretrained transformer efficiently learns low-dimensional target functions in-context
Kazusato Oko
Yujin Song
Taiji Suzuki
Denny Wu
23
4
0
04 Nov 2024
Abrupt Learning in Transformers: A Case Study on Matrix Completion
Pulkit Gopalani
Ekdeep Singh Lubana
Wei Hu
32
3
0
29 Oct 2024
Robust Feature Learning for Multi-Index Models in High Dimensions
Alireza Mousavi-Hosseini
Adel Javanmard
Murat A. Erdogdu
OOD
AAML
37
1
0
21 Oct 2024
Sharper Guarantees for Learning Neural Network Classifiers with Gradient Methods
Hossein Taheri
Christos Thrampoulidis
Arya Mazumdar
MLT
16
0
0
13 Oct 2024
From Sparse Dependence to Sparse Attention: Unveiling How Chain-of-Thought Enhances Transformer Sample Efficiency
Kaiyue Wen
Huaqing Zhang
Hongzhou Lin
Jingzhao Zhang
MoE
LRM
50
2
0
07 Oct 2024
Interpreting and Improving Large Language Models in Arithmetic Calculation
Wei Zhang
Chaoqun Wan
Yonggang Zhang
Yiu-ming Cheung
Xinmei Tian
Xu Shen
Jieping Ye
LRM
16
0
0
03 Sep 2024
Approaching Deep Learning through the Spectral Dynamics of Weights
David Yunis
Kumar Kshitij Patel
Samuel Wheeler
Pedro H. P. Savarese
Gal Vardi
Karen Livescu
Michael Maire
Matthew R. Walter
24
3
0
21 Aug 2024
Clustering and Alignment: Understanding the Training Dynamics in Modular Addition
Tiberiu Musat
16
1
0
18 Aug 2024
Information-Theoretic Progress Measures reveal Grokking is an Emergent Phase Transition
Kenzo Clauw
S. Stramaglia
Daniele Marinazzo
37
3
0
16 Aug 2024
Emergence in non-neural models: grokking modular arithmetic via average gradient outer product
Neil Rohit Mallinar
Daniel Beaglehole
Libin Zhu
Adityanarayanan Radhakrishnan
Parthe Pandit
Misha Belkin
35
7
0
29 Jul 2024
Why Do You Grok? A Theoretical Analysis of Grokking Modular Addition
Mohamad Amin Mohamadi
Zhiyuan Li
Lei Wu
Danica J. Sutherland
25
1
0
17 Jul 2024
Emergence of Hidden Capabilities: Exploring Learning Dynamics in Concept Space
Core Francisco Park
Maya Okawa
Andrew Lee
Ekdeep Singh Lubana
Hidenori Tanaka
50
6
0
27 Jun 2024
Grokking Modular Polynomials
Darshil Doshi
Tianyu He
Aritra Das
Andrey Gromov
21
4
0
05 Jun 2024
Iteration Head: A Mechanistic Study of Chain-of-Thought
Vivien A. Cabannes
Charles Arnal
Wassim Bouaziz
Alice Yang
Francois Charton
Julia Kempe
LRM
14
7
0
04 Jun 2024
Grokfast: Accelerated Grokking by Amplifying Slow Gradients
Jaerin Lee
Bong Gyun Kang
Kihoon Kim
Kyoung Mu Lee
23
11
0
30 May 2024
Why Larger Language Models Do In-context Learning Differently?
Zhenmei Shi
Junyi Wei
Zhuoyan Xu
Yingyu Liang
24
18
0
30 May 2024
Survival of the Fittest Representation: A Case Study with Modular Addition
Xiaoman Delores Ding
Zifan Carl Guo
Eric J. Michaud
Ziming Liu
Max Tegmark
29
3
0
27 May 2024
Acceleration of Grokking in Learning Arithmetic Operations via Kolmogorov-Arnold Representation
Yeachan Park
Minseok Kim
Yeoneung Kim
19
1
0
26 May 2024
The Impact of Geometric Complexity on Neural Collapse in Transfer Learning
Michael Munn
Benoit Dherin
Javier Gonzalvo
AAML
27
1
0
24 May 2024
A rationale from frequency perspective for grokking in training neural network
Zhangchen Zhou
Yaoyu Zhang
Z. Xu
28
2
0
24 May 2024
Progress Measures for Grokking on Real-world Tasks
Satvik Golechha
18
1
0
21 May 2024
Asymptotic theory of in-context learning by linear attention
Yue M. Lu
Mary I. Letey
Jacob A. Zavatone-Veth
Anindita Maiti
C. Pehlevan
19
10
0
20 May 2024
Towards a Theoretical Understanding of the 'Reversal Curse' via Training Dynamics
Hanlin Zhu
Baihe Huang
Shaolun Zhang
Michael I. Jordan
Jiantao Jiao
Yuandong Tian
Stuart Russell
LRM
AI4CE
39
13
0
07 May 2024
Mechanistic Interpretability for AI Safety -- A Review
Leonard Bereska
E. Gavves
AI4CE
30
111
0
22 Apr 2024
Matching the Statistical Query Lower Bound for k-sparse Parity Problems with Stochastic Gradient Descent
Yiwen Kou
Zixiang Chen
Quanquan Gu
Sham Kakade
18
0
0
18 Apr 2024
Towards a theory of model distillation
Enric Boix-Adserà
FedML
VLM
39
5
0
14 Mar 2024
The Heuristic Core: Understanding Subnetwork Generalization in Pretrained Language Models
Adithya Bhaskar
Dan Friedman
Danqi Chen
19
1
0
06 Mar 2024
Complexity Matters: Dynamics of Feature Learning in the Presence of Spurious Correlations
GuanWen Qiu
Da Kuang
Surbhi Goel
20
8
0
05 Mar 2024
Deep Networks Always Grok and Here is Why
Ahmed Imtiaz Humayun
Randall Balestriero
Richard Baraniuk
AAML
OOD
AI4CE
35
19
0
23 Feb 2024
The Evolution of Statistical Induction Heads: In-Context Learning Markov Chains
Benjamin L. Edelman
Ezra Edelman
Surbhi Goel
Eran Malach
Nikolaos Tsilivis
BDL
16
39
0
16 Feb 2024
Measuring Sharpness in Grokking
Jack Miller
Patrick Gleeson
Charles OÑeill
Thang Bui
Noam Levi
8
1
0
14 Feb 2024
Feature learning as alignment: a structural property of gradient descent in non-linear neural networks
Daniel Beaglehole
Ioannis Mitliagkas
Atish Agarwala
MLT
24
2
0
07 Feb 2024
Can Mamba Learn How to Learn? A Comparative Study on In-Context Learning Tasks
Jongho Park
Jaeseung Park
Zheyang Xiong
Nayoung Lee
Jaewoong Cho
Samet Oymak
Kangwook Lee
Dimitris Papailiopoulos
11
31
0
06 Feb 2024
Carrying over algorithm in transformers
J. Kruthoff
6
0
0
15 Jan 2024
Grokking Group Multiplication with Cosets
Dashiell Stander
Qinan Yu
Honglu Fan
Stella Biderman
26
9
0
11 Dec 2023
Dichotomy of Early and Late Phase Implicit Biases Can Provably Induce Grokking
Kaifeng Lyu
Jikai Jin
Zhiyuan Li
Simon S. Du
Jason D. Lee
Wei Hu
AI4CE
20
32
0
30 Nov 2023
1
2
3
Next