ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2402.04362
  4. Cited By
Neural Networks Learn Statistics of Increasing Complexity
v1v2 (latest)

Neural Networks Learn Statistics of Increasing Complexity

6 February 2024
Nora Belrose
Quintin Pope
Lucia Quirke
Alex Troy Mallen
Xiaoli Z. Fern
ArXiv (abs)PDFHTMLGithub (34★)

Papers citing "Neural Networks Learn Statistics of Increasing Complexity"

16 / 16 papers shown
Gradient-Weight Alignment as a Train-Time Proxy for Generalization in Classification Tasks
Gradient-Weight Alignment as a Train-Time Proxy for Generalization in Classification Tasks
Florian A. Hölzl
Daniel Rueckert
Georgios Kaissis
182
0
0
29 Oct 2025
Language Model Behavioral Phases are Consistent Across Architecture, Training Data, and Scale
Language Model Behavioral Phases are Consistent Across Architecture, Training Data, and Scale
J. Michaelov
Roger P. Levy
Benjamin Bergen
AI4TS
167
8
0
28 Oct 2025
Probing Geometry of Next Token Prediction Using Cumulant Expansion of the Softmax Entropy
Probing Geometry of Next Token Prediction Using Cumulant Expansion of the Softmax Entropy
Karthik Viswanathan
Sang Eon Park
163
0
0
05 Oct 2025
Convergence and Divergence of Language Models under Different Random Seeds
Convergence and Divergence of Language Models under Different Random Seeds
Finlay Fehlauer
Kyle Mahowald
Tiago Pimentel
MoMe
207
1
0
30 Sep 2025
Tracing the Representation Geometry of Language Models from Pretraining to Post-training
Tracing the Representation Geometry of Language Models from Pretraining to Post-training
Melody Zixuan Li
Kumar Krishna Agrawal
Arna Ghosh
Komal Kumar Teru
Adam Santoro
Guillaume Lajoie
Blake A. Richards
282
15
0
27 Sep 2025
Evolution of Concepts in Language Model Pre-Training
Evolution of Concepts in Language Model Pre-Training
Xuyang Ge
Wentao Shu
Jiaxing Wu
Yunhua Zhou
Zhengfu He
Xipeng Qiu
159
4
0
21 Sep 2025
The Non-Linear Representation Dilemma: Is Causal Abstraction Enough for Mechanistic Interpretability?
The Non-Linear Representation Dilemma: Is Causal Abstraction Enough for Mechanistic Interpretability?
Denis Sutter
Julian Minder
Thomas Hofmann
Tiago Pimentel
247
14
0
11 Jul 2025
A Mathematical Philosophy of Explanations in Mechanistic Interpretability -- The Strange Science Part I.i
A Mathematical Philosophy of Explanations in Mechanistic Interpretability -- The Strange Science Part I.i
Kola Ayonrinde
Louis Jaburi
MILM
613
4
0
01 May 2025
PolyPythias: Stability and Outliers across Fifty Language Model Pre-Training Runs
PolyPythias: Stability and Outliers across Fifty Language Model Pre-Training RunsInternational Conference on Learning Representations (ICLR), 2025
Oskar van der Wal
Pietro Lesci
Max Muller-Eberstein
Naomi Saphra
Hailey Schoelkopf
Willem H. Zuidema
Stella Biderman
LRM
479
23
0
12 Mar 2025
Between Circuits and Chomsky: Pre-pretraining on Formal Languages Imparts Linguistic Biases
Between Circuits and Chomsky: Pre-pretraining on Formal Languages Imparts Linguistic BiasesAnnual Meeting of the Association for Computational Linguistics (ACL), 2025
Michael Y. Hu
Jackson Petty
Chuan Shi
William Merrill
Tal Linzen
AI4CE
436
15
0
26 Feb 2025
A distributional simplicity bias in the learning dynamics of transformers
A distributional simplicity bias in the learning dynamics of transformersNeural Information Processing Systems (NeurIPS), 2024
Riccardo Rende
Federica Gerace
Alessandro Laio
Sebastian Goldt
532
20
0
17 Feb 2025
Tending Towards Stability: Convergence Challenges in Small Language
  Models
Tending Towards Stability: Convergence Challenges in Small Language ModelsConference on Empirical Methods in Natural Language Processing (EMNLP), 2024
Richard Diehl Martinez
Pietro Lesci
P. Buttery
263
8
0
15 Oct 2024
Jet Expansions of Residual Computation
Jet Expansions of Residual Computation
Yihong Chen
Xiangxiang Xu
Yao Lu
Pontus Stenetorp
Luca Franceschi
231
5
0
08 Oct 2024
Differentiation and Specialization of Attention Heads via the Refined
  Local Learning Coefficient
Differentiation and Specialization of Attention Heads via the Refined Local Learning CoefficientInternational Conference on Learning Representations (ICLR), 2024
George Wang
Jesse Hoogland
Stan van Wingerden
Zach Furman
Daniel Murfet
OffRL
297
28
0
03 Oct 2024
Early learning of the optimal constant solution in neural networks and
  humans
Early learning of the optimal constant solution in neural networks and humans
Jirko Rubruck
Jan P. Bauer
Andrew M. Saxe
Christopher Summerfield
452
6
0
25 Jun 2024
Sliding down the stairs: how correlated latent variables accelerate
  learning with neural networks
Sliding down the stairs: how correlated latent variables accelerate learning with neural networks
Lorenzo Bardone
Sebastian Goldt
329
13
0
12 Apr 2024
1
Page 1 of 1