ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2001.08361
  4. Cited By
Scaling Laws for Neural Language Models

Scaling Laws for Neural Language Models

23 January 2020
Jared Kaplan
Sam McCandlish
T. Henighan
Tom B. Brown
B. Chess
R. Child
Scott Gray
Alec Radford
Jeff Wu
Dario Amodei
ArXiv (abs)PDFHTMLHuggingFace (9 upvotes)

Papers citing "Scaling Laws for Neural Language Models"

50 / 4,145 papers shown
Efficient Federated Search for Retrieval-Augmented Generation using Lightweight Routing
Efficient Federated Search for Retrieval-Augmented Generation using Lightweight Routing
R. Guerraoui
Anne-Marie Kermarrec
Diana Petrescu
Rafael Pires
Mathis Randl
M. Vos
Martijn de Vos
RALM
390
10
0
10 Apr 2026
MALLM-GAN: Multi-Agent Large Language Model as Generative Adversarial Network for Synthesizing Tabular Data
MALLM-GAN: Multi-Agent Large Language Model as Generative Adversarial Network for Synthesizing Tabular Data
Yaobin Ling
Xiaoqian Jiang
Yejin Kim
GANSyDa
537
10
0
10 Apr 2026
Auditing Black-Box LLM APIs with a Rank-Based Uniformity Test
Auditing Black-Box LLM APIs with a Rank-Based Uniformity Test
Xiaoyuan Zhu
Yaowen Ye
Tianyi Qiu
Hanlin Zhu
Sijun Tan
Ajraf Mannan
Jonathan Michala
Raluca A. Popa
Willie Neiswanger
AAMLMLAUALM
617
6
0
10 Apr 2026
Selection, Reflection and Self-Refinement: Revisit Reasoning Tasks via a Causal Lens
Selection, Reflection and Self-Refinement: Revisit Reasoning Tasks via a Causal Lens
Yunlong Deng
Boyang Sun
Yan Li
Lingjing Kong
Zeyu Tang
Kun Zhang
Guangyi Chen
ReLMLRM
188
0
0
30 Mar 2026
Neural Models and Language Model Prompting for the Multidimensional Evaluation of Open-Ended Conversations
Neural Models and Language Model Prompting for the Multidimensional Evaluation of Open-Ended Conversations
Michelle Elizabeth
Alicja Kasicka
Natalia Krawczyk
Magalie Ochs
Gwénolé Lecorvé
Justyna Gromada
L. Rojas-Barahona
ALMELMLRM
158
2
0
30 Mar 2026
Mistake Attribution: Fine-Grained Mistake Understanding in Egocentric Videos
Mistake Attribution: Fine-Grained Mistake Understanding in Egocentric Videos
Yayuan Li
Aadit Jain
Filippos Bellos
Jason J. Corso
EgoV
167
2
0
27 Mar 2026
Toward Storage-Aware Learning with Compressed Data An Empirical Exploratory Study on JPEG
Toward Storage-Aware Learning with Compressed Data An Empirical Exploratory Study on JPEG
Kichang Lee
Songkuk Kim
JaeYeon Park
Jeonggil Ko
205
1
0
24 Dec 2025
Insights into DeepSeek-V3: Scaling Challenges and Reflections on Hardware for AI Architectures
Insights into DeepSeek-V3: Scaling Challenges and Reflections on Hardware for AI ArchitecturesInternational Symposium on Computer Architecture (ISCA), 2025
Chenggang Zhao
Chengqi Deng
Chong Ruan
Damai Dai
Huazuo Gao
...
Wenfeng Liang
Ying He
Yun Wang
Yuxuan Liu
Y. X. Wei
MoE
303
60
0
24 Dec 2025
RoBoN: Routed Online Best-of-n for Test-Time Scaling with Multiple LLMs
RoBoN: Routed Online Best-of-n for Test-Time Scaling with Multiple LLMs
Jonathan Geuter
Gregor Kornhardt
40
0
0
05 Dec 2025
Are LLMs Truly Multilingual? Exploring Zero-Shot Multilingual Capability of LLMs for Information Retrieval: An Italian Healthcare Use Case
Are LLMs Truly Multilingual? Exploring Zero-Shot Multilingual Capability of LLMs for Information Retrieval: An Italian Healthcare Use Case
Vignesh Kumar Kembu
Pierandrea Morandini
Marta Bianca Maria Ranzini
Antonino Nocera
94
2
0
04 Dec 2025
TRINITY: An Evolved LLM Coordinator
TRINITY: An Evolved LLM Coordinator
Jinglue Xu
Qi Sun
Peter Schwendeman
Stefan Nielsen
Edoardo Cetin
Yujin Tang
LLMAG
300
0
0
04 Dec 2025
SeeU: Seeing the Unseen World via 4D Dynamics-aware Generation
SeeU: Seeing the Unseen World via 4D Dynamics-aware Generation
Yu Yuan
Tharindu Wickremasinghe
Zeeshan Nadir
Xijun Wang
Yiheng Chi
Stanley H. Chan
VGenLRM
306
1
0
03 Dec 2025
From FLOPs to Footprints: The Resource Cost of Artificial Intelligence
From FLOPs to Footprints: The Resource Cost of Artificial Intelligence
Sophia Falk
N. Corrêa
Sasha Luccioni
Lisa Biber-Freudenberger
Aimee van Wynsberghe
75
2
0
03 Dec 2025
A Theoretical Framework for Auxiliary-Loss-Free Load Balancing of Sparse Mixture-of-Experts in Large-Scale AI Models
A Theoretical Framework for Auxiliary-Loss-Free Load Balancing of Sparse Mixture-of-Experts in Large-Scale AI Models
X.Y. Han
Yuan Zhong
MoE
472
0
0
03 Dec 2025
PretrainZero: Reinforcement Active Pretraining
PretrainZero: Reinforcement Active Pretraining
Xingrun Xing
Zhiyuan Fan
Jie Lou
G. Li
Jiajun Zhang
Debing Zhang
OffRLAIMatReLMLRMAI4CE
528
2
0
03 Dec 2025
Colon-X: Advancing Intelligent Colonoscopy from Multimodal Understanding to Clinical Reasoning
Colon-X: Advancing Intelligent Colonoscopy from Multimodal Understanding to Clinical Reasoning
Ge-Peng Ji
Jingyi Liu
Deng-Ping Fan
Nick Barnes
LRM
325
2
0
03 Dec 2025
Nexus: Higher-Order Attention Mechanisms in Transformers
Nexus: Higher-Order Attention Mechanisms in Transformers
Hanting Chen
Chong Zhu
Kai Han
Yuchuan Tian
Yuchen Liang
Tianyu Guo
Xinghao Chen
Dacheng Tao
Yunhe Wang
402
0
0
03 Dec 2025
CSMapping: Scalable Crowdsourced Semantic Mapping and Topology Inference for Autonomous Driving
CSMapping: Scalable Crowdsourced Semantic Mapping and Topology Inference for Autonomous Driving
Zhijian Qiao
Zehuan Yu
Tong Li
Chih-Chung Chou
Wenchao Ding
Shaojie Shen
148
0
0
03 Dec 2025
LLM-Generated Ads: From Personalization Parity to Persuasion Superiority
LLM-Generated Ads: From Personalization Parity to Persuasion Superiority
Elyas Meguellati
Stefano Civelli
Lei Han
Abraham Bernstein
S. Sadiq
Gianluca Demartini
152
0
0
03 Dec 2025
Principled RL for Diffusion LLMs Emerges from a Sequence-Level Perspective
Principled RL for Diffusion LLMs Emerges from a Sequence-Level Perspective
Jingyang Ou
Jiaqi Han
Minkai Xu
Shaoxuan Xu
Jianwen Xie
Stefano Ermon
Yi Wu
Chongxuan Li
DiffM
180
9
0
03 Dec 2025
Large Language Models for Limited Noisy Data: A Gravitational Wave Identification Study
Large Language Models for Limited Noisy Data: A Gravitational Wave Identification Study
Yixuan Li
Yuhao Lu
Y. Liu
Liang Li
R. Ruffini
Di Li
Rong-Gen Cai
Xiaoyan Zhu
Wenbin Lin
Yu Wang
211
0
0
03 Dec 2025
Data Curation Through the Lens of Spectral Dynamics: Static Limits, Dynamic Acceleration, and Practical Oracles
Data Curation Through the Lens of Spectral Dynamics: Static Limits, Dynamic Acceleration, and Practical Oracles
Yizhou Zhang
Lun Du
182
0
0
02 Dec 2025
PEFT-Factory: Unified Parameter-Efficient Fine-Tuning of Autoregressive Large Language Models
PEFT-Factory: Unified Parameter-Efficient Fine-Tuning of Autoregressive Large Language Models
Róbert Belanec
Ivan Srba
Maria Bielikova
ALM
517
0
0
02 Dec 2025
The brain-AI convergence: Predictive and generative world models for general-purpose computation
The brain-AI convergence: Predictive and generative world models for general-purpose computation
Shogo Ohmae
Keiko Ohmae
140
0
0
02 Dec 2025
Perch 2.0 transfers 'whale' to underwater tasks
Perch 2.0 transfers 'whale' to underwater tasks
Andrea Burns
Lauren Harrell
B. V. Merrienboer
Vincent Dumoulin
Jenny Hamer
Tom Denton
47
0
0
02 Dec 2025
ZO-ASR: Zeroth-Order Fine-Tuning of Speech Foundation Models without Back-Propagation
ZO-ASR: Zeroth-Order Fine-Tuning of Speech Foundation Models without Back-Propagation
Yuezhang Peng
Yu‐Xin Liu
Yao Li
S. Wang
Fei Wen
Xie Chen
174
0
0
01 Dec 2025
Neural Networks for Predicting Permeability Tensors of 2D Porous Media: Comparison of Convolution- and Transformer-based Architectures
Neural Networks for Predicting Permeability Tensors of 2D Porous Media: Comparison of Convolution- and Transformer-based Architectures
Sigurd Vargdal
Paula Reis
Henrik Andersen Sveinsson
Gaute Linga
MedIm
257
0
0
01 Dec 2025
MEGConformer: Conformer-Based MEG Decoder for Robust Speech and Phoneme Classification
MEGConformer: Conformer-Based MEG Decoder for Robust Speech and Phoneme Classification
Xabier de Zuazo
Ibon Saratxaga
Eva Navas
115
2
0
01 Dec 2025
Ensemble Privacy Defense for Knowledge-Intensive LLMs against Membership Inference Attacks
Ensemble Privacy Defense for Knowledge-Intensive LLMs against Membership Inference Attacks
Haowei Fu
Bo Ni
Han Xu
Kunpeng Liu
Dan Lin
Tyler Derr
153
0
0
01 Dec 2025
Silhouette-based Gait Foundation Model
Silhouette-based Gait Foundation Model
Dingqiang Ye
Chao Fan
Kartik Narayan
Bingzhe Wu
Chengwen Luo
Jianqiang Li
Vishal M. Patel
96
0
0
30 Nov 2025
Better, Stronger, Faster: Tackling the Trilemma in MLLM-based Segmentation with Simultaneous Textual Mask Prediction
Better, Stronger, Faster: Tackling the Trilemma in MLLM-based Segmentation with Simultaneous Textual Mask Prediction
Jiazhen Liu
Mingkuan Feng
Long Chen
150
2
0
29 Nov 2025
SimScale: Learning to Drive via Real-World Simulation at Scale
SimScale: Learning to Drive via Real-World Simulation at Scale
Haochen Tian
Tianyu Li
Haochen Liu
Jiazhi Yang
Yihang Qiu
...
Liang Wang
Hangjun Ye
Tieniu Tan
Long Chen
Hongyang Li
206
8
0
28 Nov 2025
Pathryoshka: Compressing Pathology Foundation Models via Multi-Teacher Knowledge Distillation with Nested Embeddings
Pathryoshka: Compressing Pathology Foundation Models via Multi-Teacher Knowledge Distillation with Nested Embeddings
Christian Grashei
Christian Brechenmacher
Rao Muhammad Umer
Jingsong Liu
Carsten Marr
Ewa Szczurek
Peter Schuffler
108
0
0
28 Nov 2025
Experts are all you need: A Composable Framework for Large Language Model Inference
Experts are all you need: A Composable Framework for Large Language Model Inference
S. Sridharan
Sourjya Roy
A. Raghunathan
Kaushik Roy
MoE
227
0
0
28 Nov 2025
Rethinking Test Time Scaling for Flow-Matching Generative Models
Rethinking Test Time Scaling for Flow-Matching Generative Models
Qingtao Yu
Changlin Song
Minghao Sun
Zhengyang Yu
Vinay Kumar Verma
Soumya Roy
Sumit Negi
Hongdong Li
Dylan Campbell
123
1
0
27 Nov 2025
An interpretable unsupervised representation learning for high precision measurement in particle physics
An interpretable unsupervised representation learning for high precision measurement in particle physics
Xing-Jian Lv
De-Xing Miao
Zi-Jun Xu
Jian-Chun Wang
33
0
0
27 Nov 2025
DualVLA: Building a Generalizable Embodied Agent via Partial Decoupling of Reasoning and Action
DualVLA: Building a Generalizable Embodied Agent via Partial Decoupling of Reasoning and Action
Zhen Fang
Zhuoyang Liu
Jiaming Liu
Hao Chen
Y. Zeng
Shiting Huang
Zehui Chen
L. Chen
Shanghang Zhang
Feng Zhao
LRM
147
4
0
27 Nov 2025
On the Origin of Algorithmic Progress in AI
On the Origin of Algorithmic Progress in AI
Hans Gundlach
Alex Fogelson
Jayson Lynch
Ana Trisovic
Jonathan Rosenfeld
Anmol Sandhu
Neil Thompson
130
1
0
26 Nov 2025
Mechanisms of Non-Monotonic Scaling in Vision Transformers
Mechanisms of Non-Monotonic Scaling in Vision Transformers
Anantha Padmanaban Krishna Kumar
152
0
0
26 Nov 2025
Closed-Loop Transformers: Autoregressive Modeling as Iterative Latent Equilibrium
Closed-Loop Transformers: Autoregressive Modeling as Iterative Latent Equilibrium
Akbar Anbar Jafari
G. Anbarjafari
102
2
0
26 Nov 2025
Emergent Lexical Semantics in Neural Language Models: Testing Martin's Law on LLM-Generated Text
Emergent Lexical Semantics in Neural Language Models: Testing Martin's Law on LLM-Generated Text
Kai Kugler
141
0
0
26 Nov 2025
Deep Learning as a Convex Paradigm of Computation: Minimizing Circuit Size with ResNets
Deep Learning as a Convex Paradigm of Computation: Minimizing Circuit Size with ResNets
Arthur Jacot
UQCV
146
1
0
25 Nov 2025
HHFT: Hierarchical Heterogeneous Feature Transformer for Recommendation Systems
HHFT: Hierarchical Heterogeneous Feature Transformer for Recommendation Systems
Liren Yu
Wenming Zhang
Silu Zhou
Zhixuan Zhang
Dan Ou
Dan Ou
255
8
0
25 Nov 2025
Designing Preconditioners for SGD: Local Conditioning, Noise Floors, and Basin Stability
Designing Preconditioners for SGD: Local Conditioning, Noise Floors, and Basin Stability
Mitchell Scott
Tianshi Xu
Z. Tang
Alexandra Pichette-Emmons
Qiang Ye
Y. Saad
Yuanzhe Xi
AI4CE
335
3
0
24 Nov 2025
Fewer Tokens, Greater Scaling: Self-Adaptive Visual Bases for Efficient and Expansive Representation Learning
Fewer Tokens, Greater Scaling: Self-Adaptive Visual Bases for Efficient and Expansive Representation Learning
Shawn Young
Xingyu Zeng
Lijian Xu
VLM
130
6
0
24 Nov 2025
Fast Escape, Slow Convergence: Learning Dynamics of Phase Retrieval under Power-Law Data
Fast Escape, Slow Convergence: Learning Dynamics of Phase Retrieval under Power-Law Data
Guillaume Braun
Bruno Loureiro
Ha Quang Minh
Masaaki Imaizumi
159
1
0
24 Nov 2025
Efficient Inference Using Large Language Models with Limited Human Data: Fine-Tuning then Rectification
Efficient Inference Using Large Language Models with Limited Human Data: Fine-Tuning then Rectification
Lei Wang
Zikun Ye
Jinglong Zhao
ALM
267
1
0
23 Nov 2025
Dealing with the Hard Facts of Low-Resource African NLP
Dealing with the Hard Facts of Low-Resource African NLP
Yacouba Diarra
Nouhoum Souleymane Coulibaly
Panga Azazia Kamaté
Madani Amadou Tall
Emmanuel Élisé Koné
Aymane Dembélé
Michael Leventhal
125
1
0
23 Nov 2025
Foundations of Artificial Intelligence Frameworks: Notion and Limits of AGI
Foundations of Artificial Intelligence Frameworks: Notion and Limits of AGI
Khanh Gia Bui
NAIAI4CE
412
0
0
23 Nov 2025
SPINE: Token-Selective Test-Time Reinforcement Learning with Entropy-Band Regularization
SPINE: Token-Selective Test-Time Reinforcement Learning with Entropy-Band Regularization
Jianghao Wu
Yasmeen George
Jin Ye
Y. Wu
Daniel F. Schmidt
Jianfei Cai
LRM
162
4
0
22 Nov 2025
1234...818283
Next
Page 1 of 83
Pageof 83