Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2205.10343
Cited By
Towards Understanding Grokking: An Effective Theory of Representation Learning
20 May 2022
Ziming Liu
O. Kitouni
Niklas Nolte
Eric J. Michaud
Max Tegmark
Mike Williams
AI4CE
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Towards Understanding Grokking: An Effective Theory of Representation Learning"
34 / 34 papers shown
Title
Reinforcement Learning for Reasoning in Large Language Models with One Training Example
Yiping Wang
Qing Yang
Zhiyuan Zeng
Liliang Ren
L. Liu
...
Jianfeng Gao
Weizhu Chen
Shuaiqiang Wang
Simon S. Du
Yelong Shen
OffRL
ReLM
LRM
135
10
0
29 Apr 2025
Representation Learning on a Random Lattice
Aryeh Brill
OOD
FAtt
AI4CE
75
0
0
28 Apr 2025
NeuralGrok: Accelerate Grokking by Neural Gradient Transformation
Xinyu Zhou
Simin Fan
Martin Jaggi
Jie Fu
41
0
0
24 Apr 2025
Beyond the Next Token: Towards Prompt-Robust Zero-Shot Classification via Efficient Multi-Token Prediction
Junlang Qian
Zixiao Zhu
Hanzhang Zhou
Zijian Feng
Zepeng Zhai
K. Mao
AAML
VLM
43
0
0
04 Apr 2025
Early Stopping Against Label Noise Without Validation Data
Suqin Yuan
Lei Feng
Tongliang Liu
NoLa
104
18
0
11 Feb 2025
Grokking Explained: A Statistical Phenomenon
B. W. Carvalho
Artur Garcez
Luís C. Lamb
Emílio Vital Brazil
69
0
0
03 Feb 2025
Harmonic Loss Trains Interpretable AI Models
David D. Baek
Ziming Liu
Riya Tyagi
Max Tegmark
97
2
0
03 Feb 2025
Physics of Skill Learning
Ziming Liu
Yizhou Liu
Eric J. Michaud
Jeff Gore
Max Tegmark
54
2
0
21 Jan 2025
How to explain grokking
S. V. Kozyrev
AI4CE
36
0
0
03 Jan 2025
ICLR: In-Context Learning of Representations
Core Francisco Park
Andrew Lee
Ekdeep Singh Lubana
Yongyi Yang
Maya Okawa
Kento Nishi
Martin Wattenberg
Hidenori Tanaka
AIFin
123
4
0
29 Dec 2024
On Memorization of Large Language Models in Logical Reasoning
Chulin Xie
Yangsibo Huang
Chiyuan Zhang
Da Yu
Xinyun Chen
Bill Yuchen Lin
Bo Li
Badih Ghazi
Ravi Kumar
LRM
58
25
0
30 Oct 2024
Formation of Representations in Neural Networks
Liu Ziyin
Isaac Chuang
Tomer Galanti
T. Poggio
41
4
0
03 Oct 2024
Zero-shot forecasting of chaotic systems
Yuanzhao Zhang
William Gilpin
AI4TS
42
6
0
24 Sep 2024
Survival of the Fittest Representation: A Case Study with Modular Addition
Xiaoman Delores Ding
Zifan Carl Guo
Eric J. Michaud
Ziming Liu
Max Tegmark
50
3
0
27 May 2024
How Do Transformers "Do" Physics? Investigating the Simple Harmonic Oscillator
Subhash Kantamneni
Ziming Liu
Max Tegmark
19
2
0
23 May 2024
The Garden of Forking Paths: Observing Dynamic Parameters Distribution in Large Language Models
Carlo Nicolini
Jacopo Staiano
Bruno Lepri
Raffaele Marino
MoE
34
1
0
13 Mar 2024
Tune without Validation: Searching for Learning Rate and Weight Decay on Training Sets
Lorenzo Brigato
Stavroula Mougiakakou
45
0
0
08 Mar 2024
Opening the AI black box: program synthesis via mechanistic interpretability
Eric J. Michaud
Isaac Liao
Vedang Lad
Ziming Liu
Anish Mudide
Chloe Loughridge
Zifan Carl Guo
Tara Rezaei Kheirkhah
Mateja Vukelić
Max Tegmark
25
12
0
07 Feb 2024
Grokking as Compression: A Nonlinear Complexity Perspective
Ziming Liu
Ziqian Zhong
Max Tegmark
38
9
0
09 Oct 2023
Grokking as a First Order Phase Transition in Two Layer Networks
Noa Rubin
Inbar Seroussi
Zohar Ringel
37
16
0
05 Oct 2023
Benign Overfitting and Grokking in ReLU Networks for XOR Cluster Data
Zhiwei Xu
Yutong Wang
Spencer Frei
Gal Vardi
Wei Hu
MLT
28
24
0
04 Oct 2023
It Ain't That Bad: Understanding the Mysterious Performance Drop in OOD Generalization for Generative Transformer Models
Xingcheng Xu
Zihao Pan
Haipeng Zhang
Yanqing Yang
LRM
26
2
0
16 Aug 2023
Faith and Fate: Limits of Transformers on Compositionality
Nouha Dziri
Ximing Lu
Melanie Sclar
Xiang Lorraine Li
Liwei Jian
...
Sean Welleck
Xiang Ren
Allyson Ettinger
Zaïd Harchaoui
Yejin Choi
ReLM
LRM
34
336
0
29 May 2023
Seeing is Believing: Brain-Inspired Modular Training for Mechanistic Interpretability
Ziming Liu
Eric Gan
Max Tegmark
26
36
0
04 May 2023
Unifying Grokking and Double Descent
Peter W. Battaglia
David Raposo
Kelsey
42
31
0
10 Mar 2023
A Toy Model of Universality: Reverse Engineering How Networks Learn Group Operations
Bilal Chughtai
Lawrence Chan
Neel Nanda
21
96
0
06 Feb 2023
Progress measures for grokking via mechanistic interpretability
Neel Nanda
Lawrence Chan
Tom Lieberum
Jess Smith
Jacob Steinhardt
49
386
0
12 Jan 2023
Logical Tasks for Measuring Extrapolation and Rule Comprehension
Ippei Fujisawa
Ryota Kanai
ELM
LRM
28
4
0
14 Nov 2022
Grokking phase transitions in learning local rules with gradient descent
Bojan Žunkovič
E. Ilievski
63
17
0
26 Oct 2022
Omnigrok: Grokking Beyond Algorithmic Data
Ziming Liu
Eric J. Michaud
Max Tegmark
56
77
0
03 Oct 2022
In-context Learning and Induction Heads
Catherine Olsson
Nelson Elhage
Neel Nanda
Nicholas Joseph
Nova Dassarma
...
Tom B. Brown
Jack Clark
Jared Kaplan
Sam McCandlish
C. Olah
252
474
0
24 Sep 2022
Multi-scale Feature Learning Dynamics: Insights for Double Descent
Mohammad Pezeshki
Amartya Mitra
Yoshua Bengio
Guillaume Lajoie
61
25
0
06 Dec 2021
Neural Mechanics: Symmetry and Broken Conservation Laws in Deep Learning Dynamics
D. Kunin
Javier Sagastuy-Breña
Surya Ganguli
Daniel L. K. Yamins
Hidenori Tanaka
107
77
0
08 Dec 2020
Contrastive Representation Learning: A Framework and Review
Phúc H. Lê Khắc
Graham Healy
Alan F. Smeaton
SSL
AI4TS
186
687
0
10 Oct 2020
1