Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales

Terms and Conditions

Twitter GitHub LinkedIn Bluesky Youtube

© 2026 ResearchTrend.AI, All rights reserved.

Home
Papers
2402.04362
Cited By

Neural Networks Learn Statistics of Increasing Complexity

v1v2 (latest)

Neural Networks Learn Statistics of Increasing Complexity

6 February 2024

Alex Troy Mallen

ArXiv (abs)PDF HTML Github (34★)

Papers citing "Neural Networks Learn Statistics of Increasing Complexity"

16 / 16 papers shown

Gradient-Weight Alignment as a Train-Time Proxy for Generalization in Classification Tasks

Gradient-Weight Alignment as a Train-Time Proxy for Generalization in Classification Tasks

Florian A. Hölzl

Daniel Rueckert

Georgios Kaissis

182

0

0

29 Oct 2025

Language Model Behavioral Phases are Consistent Across Architecture, Training Data, and Scale

Language Model Behavioral Phases are Consistent Across Architecture, Training Data, and Scale

Benjamin Bergen

167

8

0

28 Oct 2025

Probing Geometry of Next Token Prediction Using Cumulant Expansion of the Softmax Entropy

Probing Geometry of Next Token Prediction Using Cumulant Expansion of the Softmax Entropy

Karthik Viswanathan

163

0

0

05 Oct 2025

Convergence and Divergence of Language Models under Different Random Seeds

Convergence and Divergence of Language Models under Different Random Seeds

Finlay Fehlauer

207

1

0

30 Sep 2025

Tracing the Representation Geometry of Language Models from Pretraining to Post-training

Tracing the Representation Geometry of Language Models from Pretraining to Post-training

Melody Zixuan Li

Kumar Krishna Agrawal

Komal Kumar Teru

Guillaume Lajoie

Blake A. Richards

282

15

0

27 Sep 2025

Evolution of Concepts in Language Model Pre-Training

Evolution of Concepts in Language Model Pre-Training

159

4

0

21 Sep 2025

The Non-Linear Representation Dilemma: Is Causal Abstraction Enough for Mechanistic Interpretability?

The Non-Linear Representation Dilemma: Is Causal Abstraction Enough for Mechanistic Interpretability?

247

14

0

11 Jul 2025

A Mathematical Philosophy of Explanations in Mechanistic Interpretability -- The Strange Science Part I.i

A Mathematical Philosophy of Explanations in Mechanistic Interpretability -- The Strange Science Part I.i

613

4

0

01 May 2025

PolyPythias: Stability and Outliers across Fifty Language Model Pre-Training Runs

PolyPythias: Stability and Outliers across Fifty Language Model Pre-Training RunsInternational Conference on Learning Representations (ICLR), 2025

Oskar van der Wal

Max Muller-Eberstein

Hailey Schoelkopf

Willem H. Zuidema

Stella Biderman

479

23

0

12 Mar 2025

Between Circuits and Chomsky: Pre-pretraining on Formal Languages Imparts Linguistic Biases

Between Circuits and Chomsky: Pre-pretraining on Formal Languages Imparts Linguistic BiasesAnnual Meeting of the Association for Computational Linguistics (ACL), 2025

William Merrill

436

15

0

26 Feb 2025

A distributional simplicity bias in the learning dynamics of transformers

A distributional simplicity bias in the learning dynamics of transformersNeural Information Processing Systems (NeurIPS), 2024

Federica Gerace

Alessandro Laio

Sebastian Goldt

532

20

0

17 Feb 2025

Tending Towards Stability: Convergence Challenges in Small Language
Models

Tending Towards Stability: Convergence Challenges in Small Language ModelsConference on Empirical Methods in Natural Language Processing (EMNLP), 2024

Richard Diehl Martinez

263

8

0

15 Oct 2024

Jet Expansions of Residual Computation

Jet Expansions of Residual Computation

Yao Lu

Pontus Stenetorp

Luca Franceschi

231

5

0

08 Oct 2024

Differentiation and Specialization of Attention Heads via the Refined
Local Learning Coefficient

Differentiation and Specialization of Attention Heads via the Refined Local Learning CoefficientInternational Conference on Learning Representations (ICLR), 2024

Stan van Wingerden

297

28

0

03 Oct 2024

Early learning of the optimal constant solution in neural networks and
humans

Early learning of the optimal constant solution in neural networks and humans

Christopher Summerfield

452

6

0

25 Jun 2024

Sliding down the stairs: how correlated latent variables accelerate
learning with neural networks

Sliding down the stairs: how correlated latent variables accelerate learning with neural networks

Lorenzo Bardone

Sebastian Goldt

329

13

0

12 Apr 2024

Page 1 of 1