ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2207.08799
  4. Cited By
Hidden Progress in Deep Learning: SGD Learns Parities Near the
  Computational Limit

Hidden Progress in Deep Learning: SGD Learns Parities Near the Computational Limit

18 July 2022
Boaz Barak
Benjamin L. Edelman
Surbhi Goel
Sham Kakade
Eran Malach
Cyril Zhang
ArXivPDFHTML

Papers citing "Hidden Progress in Deep Learning: SGD Learns Parities Near the Computational Limit"

50 / 108 papers shown
Title
New Statistical and Computational Results for Learning Junta Distributions
New Statistical and Computational Results for Learning Junta Distributions
Lorenzo Beretta
21
0
0
09 May 2025
Quiet Feature Learning in Algorithmic Tasks
Quiet Feature Learning in Algorithmic Tasks
Prudhviraj Naidu
Zixian Wang
Leon Bergen
R. Paturi
VLM
49
0
0
06 May 2025
Let Me Grok for You: Accelerating Grokking via Embedding Transfer from a Weaker Model
Let Me Grok for You: Accelerating Grokking via Embedding Transfer from a Weaker Model
Zhiwei Xu
Zhiyu Ni
Yixin Wang
Wei Hu
CLL
27
0
0
17 Apr 2025
A Two-Phase Perspective on Deep Learning Dynamics
A Two-Phase Perspective on Deep Learning Dynamics
Robert de Mello Koch
Animik Ghosh
29
0
0
17 Apr 2025
Beyond the Next Token: Towards Prompt-Robust Zero-Shot Classification via Efficient Multi-Token Prediction
Beyond the Next Token: Towards Prompt-Robust Zero-Shot Classification via Efficient Multi-Token Prediction
Junlang Qian
Zixiao Zhu
Hanzhang Zhou
Zijian Feng
Zepeng Zhai
K. Mao
AAML
VLM
35
0
0
04 Apr 2025
Efficient Knowledge Distillation via Curriculum Extraction
Efficient Knowledge Distillation via Curriculum Extraction
Shivam Gupta
Sushrut Karmalkar
37
0
0
21 Mar 2025
Low-dimensional Functions are Efficiently Learnable under Randomly Biased Distributions
Elisabetta Cornacchia
Dan Mikulincer
Elchanan Mossel
51
0
0
10 Feb 2025
Explaining Context Length Scaling and Bounds for Language Models
Explaining Context Length Scaling and Bounds for Language Models
Jingzhe Shi
Qinwei Ma
Hongyi Liu
Hang Zhao
Jeng-Neng Hwang
Serge Belongie
Lei Li
LRM
62
2
0
03 Feb 2025
An Attempt to Unraveling Token Prediction Refinement and Identifying Essential Layers of Large Language Models
Jaturong Kongmanee
34
1
0
28 Jan 2025
Grokking at the Edge of Numerical Stability
Grokking at the Edge of Numerical Stability
Lucas Prieto
Melih Barsbey
Pedro A.M. Mediano
Tolga Birdal
32
3
0
08 Jan 2025
Exploring Grokking: Experimental and Mechanistic Investigations
Exploring Grokking: Experimental and Mechanistic Investigations
Hu Qiye
Zhou Hao
Yu RuoXi
71
1
0
14 Dec 2024
Learning Gaussian Multi-Index Models with Gradient Flow: Time Complexity and Directional Convergence
Learning Gaussian Multi-Index Models with Gradient Flow: Time Complexity and Directional Convergence
Berfin Simsek
Amire Bendjeddou
Daniel Hsu
32
0
0
13 Nov 2024
Scaling Laws for Precision
Scaling Laws for Precision
Tanishq Kumar
Zachary Ankner
Benjamin Spector
Blake Bordelon
Niklas Muennighoff
Mansheej Paul
C. Pehlevan
Christopher Ré
Aditi Raghunathan
AIFin
MoMe
46
12
0
07 Nov 2024
Do Mice Grok? Glimpses of Hidden Progress During Overtraining in Sensory
  Cortex
Do Mice Grok? Glimpses of Hidden Progress During Overtraining in Sensory Cortex
Tanishq Kumar
Blake Bordelon
C. Pehlevan
Venkatesh N. Murthy
Samuel Gershman
OOD
CLL
SSL
43
0
0
05 Nov 2024
Pretrained transformer efficiently learns low-dimensional target
  functions in-context
Pretrained transformer efficiently learns low-dimensional target functions in-context
Kazusato Oko
Yujin Song
Taiji Suzuki
Denny Wu
25
4
0
04 Nov 2024
Abrupt Learning in Transformers: A Case Study on Matrix Completion
Abrupt Learning in Transformers: A Case Study on Matrix Completion
Pulkit Gopalani
Ekdeep Singh Lubana
Wei Hu
40
3
0
29 Oct 2024
Robust Feature Learning for Multi-Index Models in High Dimensions
Robust Feature Learning for Multi-Index Models in High Dimensions
Alireza Mousavi-Hosseini
Adel Javanmard
Murat A. Erdogdu
OOD
AAML
37
1
0
21 Oct 2024
Sharper Guarantees for Learning Neural Network Classifiers with Gradient
  Methods
Sharper Guarantees for Learning Neural Network Classifiers with Gradient Methods
Hossein Taheri
Christos Thrampoulidis
Arya Mazumdar
MLT
21
0
0
13 Oct 2024
From Sparse Dependence to Sparse Attention: Unveiling How Chain-of-Thought Enhances Transformer Sample Efficiency
From Sparse Dependence to Sparse Attention: Unveiling How Chain-of-Thought Enhances Transformer Sample Efficiency
Kaiyue Wen
Huaqing Zhang
Hongzhou Lin
Jingzhao Zhang
MoE
LRM
52
2
0
07 Oct 2024
Interpreting and Improving Large Language Models in Arithmetic
  Calculation
Interpreting and Improving Large Language Models in Arithmetic Calculation
Wei Zhang
Chaoqun Wan
Yonggang Zhang
Yiu-ming Cheung
Xinmei Tian
Xu Shen
Jieping Ye
LRM
16
0
0
03 Sep 2024
Approaching Deep Learning through the Spectral Dynamics of Weights
Approaching Deep Learning through the Spectral Dynamics of Weights
David Yunis
Kumar Kshitij Patel
Samuel Wheeler
Pedro H. P. Savarese
Gal Vardi
Karen Livescu
Michael Maire
Matthew R. Walter
29
3
0
21 Aug 2024
Clustering and Alignment: Understanding the Training Dynamics in Modular
  Addition
Clustering and Alignment: Understanding the Training Dynamics in Modular Addition
Tiberiu Musat
16
1
0
18 Aug 2024
Information-Theoretic Progress Measures reveal Grokking is an Emergent
  Phase Transition
Information-Theoretic Progress Measures reveal Grokking is an Emergent Phase Transition
Kenzo Clauw
S. Stramaglia
Daniele Marinazzo
40
3
0
16 Aug 2024
Emergence in non-neural models: grokking modular arithmetic via average
  gradient outer product
Emergence in non-neural models: grokking modular arithmetic via average gradient outer product
Neil Rohit Mallinar
Daniel Beaglehole
Libin Zhu
Adityanarayanan Radhakrishnan
Parthe Pandit
Misha Belkin
37
7
0
29 Jul 2024
Why Do You Grok? A Theoretical Analysis of Grokking Modular Addition
Why Do You Grok? A Theoretical Analysis of Grokking Modular Addition
Mohamad Amin Mohamadi
Zhiyuan Li
Lei Wu
Danica J. Sutherland
25
1
0
17 Jul 2024
Emergence of Hidden Capabilities: Exploring Learning Dynamics in Concept
  Space
Emergence of Hidden Capabilities: Exploring Learning Dynamics in Concept Space
Core Francisco Park
Maya Okawa
Andrew Lee
Ekdeep Singh Lubana
Hidenori Tanaka
50
6
0
27 Jun 2024
Grokking Modular Polynomials
Grokking Modular Polynomials
Darshil Doshi
Tianyu He
Aritra Das
Andrey Gromov
23
4
0
05 Jun 2024
Iteration Head: A Mechanistic Study of Chain-of-Thought
Iteration Head: A Mechanistic Study of Chain-of-Thought
Vivien A. Cabannes
Charles Arnal
Wassim Bouaziz
Alice Yang
Francois Charton
Julia Kempe
LRM
21
7
0
04 Jun 2024
Grokfast: Accelerated Grokking by Amplifying Slow Gradients
Grokfast: Accelerated Grokking by Amplifying Slow Gradients
Jaerin Lee
Bong Gyun Kang
Kihoon Kim
Kyoung Mu Lee
25
11
0
30 May 2024
Why Larger Language Models Do In-context Learning Differently?
Why Larger Language Models Do In-context Learning Differently?
Zhenmei Shi
Junyi Wei
Zhuoyan Xu
Yingyu Liang
31
18
0
30 May 2024
Survival of the Fittest Representation: A Case Study with Modular
  Addition
Survival of the Fittest Representation: A Case Study with Modular Addition
Xiaoman Delores Ding
Zifan Carl Guo
Eric J. Michaud
Ziming Liu
Max Tegmark
29
3
0
27 May 2024
Acceleration of Grokking in Learning Arithmetic Operations via
  Kolmogorov-Arnold Representation
Acceleration of Grokking in Learning Arithmetic Operations via Kolmogorov-Arnold Representation
Yeachan Park
Minseok Kim
Yeoneung Kim
21
1
0
26 May 2024
The Impact of Geometric Complexity on Neural Collapse in Transfer
  Learning
The Impact of Geometric Complexity on Neural Collapse in Transfer Learning
Michael Munn
Benoit Dherin
Javier Gonzalvo
AAML
30
1
0
24 May 2024
A rationale from frequency perspective for grokking in training neural
  network
A rationale from frequency perspective for grokking in training neural network
Zhangchen Zhou
Yaoyu Zhang
Z. Xu
28
2
0
24 May 2024
Progress Measures for Grokking on Real-world Tasks
Progress Measures for Grokking on Real-world Tasks
Satvik Golechha
18
1
0
21 May 2024
Asymptotic theory of in-context learning by linear attention
Asymptotic theory of in-context learning by linear attention
Yue M. Lu
Mary I. Letey
Jacob A. Zavatone-Veth
Anindita Maiti
C. Pehlevan
19
10
0
20 May 2024
Towards a Theoretical Understanding of the 'Reversal Curse' via Training
  Dynamics
Towards a Theoretical Understanding of the 'Reversal Curse' via Training Dynamics
Hanlin Zhu
Baihe Huang
Shaolun Zhang
Michael I. Jordan
Jiantao Jiao
Yuandong Tian
Stuart Russell
LRM
AI4CE
41
13
0
07 May 2024
Mechanistic Interpretability for AI Safety -- A Review
Mechanistic Interpretability for AI Safety -- A Review
Leonard Bereska
E. Gavves
AI4CE
35
111
0
22 Apr 2024
Matching the Statistical Query Lower Bound for k-sparse Parity Problems
  with Stochastic Gradient Descent
Matching the Statistical Query Lower Bound for k-sparse Parity Problems with Stochastic Gradient Descent
Yiwen Kou
Zixiang Chen
Quanquan Gu
Sham Kakade
18
0
0
18 Apr 2024
Towards a theory of model distillation
Towards a theory of model distillation
Enric Boix-Adserà
FedML
VLM
44
6
0
14 Mar 2024
The Heuristic Core: Understanding Subnetwork Generalization in
  Pretrained Language Models
The Heuristic Core: Understanding Subnetwork Generalization in Pretrained Language Models
Adithya Bhaskar
Dan Friedman
Danqi Chen
21
1
0
06 Mar 2024
Complexity Matters: Dynamics of Feature Learning in the Presence of
  Spurious Correlations
Complexity Matters: Dynamics of Feature Learning in the Presence of Spurious Correlations
GuanWen Qiu
Da Kuang
Surbhi Goel
25
8
0
05 Mar 2024
Deep Networks Always Grok and Here is Why
Deep Networks Always Grok and Here is Why
Ahmed Imtiaz Humayun
Randall Balestriero
Richard Baraniuk
AAML
OOD
AI4CE
37
19
0
23 Feb 2024
The Evolution of Statistical Induction Heads: In-Context Learning Markov
  Chains
The Evolution of Statistical Induction Heads: In-Context Learning Markov Chains
Benjamin L. Edelman
Ezra Edelman
Surbhi Goel
Eran Malach
Nikolaos Tsilivis
BDL
16
39
0
16 Feb 2024
Measuring Sharpness in Grokking
Measuring Sharpness in Grokking
Jack Miller
Patrick Gleeson
Charles OÑeill
Thang Bui
Noam Levi
8
1
0
14 Feb 2024
Feature learning as alignment: a structural property of gradient descent
  in non-linear neural networks
Feature learning as alignment: a structural property of gradient descent in non-linear neural networks
Daniel Beaglehole
Ioannis Mitliagkas
Atish Agarwala
MLT
31
2
0
07 Feb 2024
Can Mamba Learn How to Learn? A Comparative Study on In-Context Learning
  Tasks
Can Mamba Learn How to Learn? A Comparative Study on In-Context Learning Tasks
Jongho Park
Jaeseung Park
Zheyang Xiong
Nayoung Lee
Jaewoong Cho
Samet Oymak
Kangwook Lee
Dimitris Papailiopoulos
13
31
0
06 Feb 2024
Carrying over algorithm in transformers
Carrying over algorithm in transformers
J. Kruthoff
13
0
0
15 Jan 2024
Grokking Group Multiplication with Cosets
Grokking Group Multiplication with Cosets
Dashiell Stander
Qinan Yu
Honglu Fan
Stella Biderman
28
9
0
11 Dec 2023
Dichotomy of Early and Late Phase Implicit Biases Can Provably Induce
  Grokking
Dichotomy of Early and Late Phase Implicit Biases Can Provably Induce Grokking
Kaifeng Lyu
Jikai Jin
Zhiyuan Li
Simon S. Du
Jason D. Lee
Wei Hu
AI4CE
20
32
0
30 Nov 2023
123
Next