Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2207.10551
Cited By
Scaling Laws vs Model Architectures: How does Inductive Bias Influence Scaling?
21 July 2022
Yi Tay
Mostafa Dehghani
Samira Abnar
Hyung Won Chung
W. Fedus
J. Rao
Sharan Narang
Vinh Q. Tran
Dani Yogatama
Donald Metzler
AI4CE
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Scaling Laws vs Model Architectures: How does Inductive Bias Influence Scaling?"
50 / 82 papers shown
Title
Position: Enough of Scaling LLMs! Lets Focus on Downscaling
Ayan Sengupta
Yash Goel
Tanmoy Chakraborty
34
0
0
02 May 2025
Scaling Laws for Data-Efficient Visual Transfer Learning
Wenxuan Yang
Qingqu Wei
Chenxi Ma
Weimin Tan
Bo Yan
25
0
0
17 Apr 2025
RouterEval: A Comprehensive Benchmark for Routing LLMs to Explore Model-level Scaling Up in LLMs
Zhongzhan Huang
Guoming Ling
Vincent S. Liang
Yupei Lin
Yandong Chen
Shanshan Zhong
Hefeng Wu
Liang Lin
LRM
54
2
0
08 Mar 2025
(Mis)Fitting: A Survey of Scaling Laws
Margaret Li
Sneha Kudugunta
Luke Zettlemoyer
69
2
0
26 Feb 2025
How to Upscale Neural Networks with Scaling Law? A Survey and Practical Guidelines
Ayan Sengupta
Yash Goel
Tanmoy Chakraborty
41
0
0
17 Feb 2025
Predicting Emergent Capabilities by Finetuning
Charlie Snell
Eric Wallace
Dan Klein
Sergey Levine
ELM
LRM
75
5
0
25 Nov 2024
Hardware Scaling Trends and Diminishing Returns in Large-Scale Distributed Training
Jared Fernandez
Luca Wehrstedt
Leonid Shamis
Mostafa Elhoushi
Kalyan Saladi
Yonatan Bisk
Emma Strubell
Jacob Kahn
115
3
0
20 Nov 2024
Scaling Laws for Precision
Tanishq Kumar
Zachary Ankner
Benjamin Spector
Blake Bordelon
Niklas Muennighoff
Mansheej Paul
C. Pehlevan
Christopher Ré
Aditi Raghunathan
AIFin
MoMe
46
12
0
07 Nov 2024
Does equivariance matter at scale?
Johann Brehmer
S. Behrends
P. D. Haan
Taco S. Cohen
40
10
0
30 Oct 2024
A Hitchhiker's Guide to Scaling Law Estimation
Leshem Choshen
Yang Zhang
Jacob Andreas
41
6
0
15 Oct 2024
A Theoretical Survey on Foundation Models
Shi Fu
Yuzhu Chen
Yingjie Wang
Dacheng Tao
21
0
0
15 Oct 2024
Geometric Inductive Biases of Deep Networks: The Role of Data and Architecture
Sajad Movahedi
Antonio Orvieto
Seyed-Mohsen Moosavi-Dezfooli
AAML
AI4CE
70
0
0
15 Oct 2024
Understanding the Interplay of Scale, Data, and Bias in Language Models: A Case Study with BERT
Muhammad Ali
Swetasudha Panda
Qinlan Shen
Michael Wick
Ari Kobren
MILM
19
3
0
25 Jul 2024
Scaling Laws with Vocabulary: Larger Models Deserve Larger Vocabularies
Chaofan Tao
Qian Liu
Longxu Dou
Niklas Muennighoff
Zhongwei Wan
Ping Luo
Min-Bin Lin
Ngai Wong
PILM
50
45
0
18 Jul 2024
Breaking Language Barriers: Cross-Lingual Continual Pre-Training at Scale
Wenzhen Zheng
Wenbo Pan
Xu Xu
Libo Qin
Li Yue
Ming Zhou
CLL
29
6
0
02 Jul 2024
Unveiling and Controlling Anomalous Attention Distribution in Transformers
Ruiqing Yan
Xingbo Du
Haoyu Deng
Linghan Zheng
Qiuzhuang Sun
Jifang Hu
Yuhang Shao
Penghao Jiang
Jinrong Jiang
Lian Zhao
29
1
0
26 Jun 2024
Evidence of a log scaling law for political persuasion with large language models
Kobi Hackenburg
Ben M. Tappin
Paul Röttger
Scott Hale
Jonathan Bright
Helen Z. Margetts
34
7
0
20 Jun 2024
MoEUT: Mixture-of-Experts Universal Transformers
Róbert Csordás
Kazuki Irie
Jürgen Schmidhuber
Christopher Potts
Christopher D. Manning
MoE
29
5
0
25 May 2024
Base of RoPE Bounds Context Length
Xin Men
Mingyu Xu
Bingning Wang
Qingyu Zhang
Hongyu Lin
Xianpei Han
Weipeng Chen
29
18
0
23 May 2024
Hierarchical Context Merging: Better Long Context Understanding for Pre-trained LLMs
Woomin Song
Seunghyuk Oh
Sangwoo Mo
Jaehyung Kim
Sukmin Yun
Jung-Woo Ha
Jinwoo Shin
28
14
0
16 Apr 2024
TransformerFAM: Feedback attention is working memory
Dongseong Hwang
Weiran Wang
Zhuoyuan Huo
K. Sim
P. M. Mengibar
27
12
0
14 Apr 2024
Megalodon: Efficient LLM Pretraining and Inference with Unlimited Context Length
Xuezhe Ma
Xiaomeng Yang
Wenhan Xiong
Beidi Chen
Lili Yu
Hao Zhang
Jonathan May
Luke Zettlemoyer
Omer Levy
Chunting Zhou
43
25
0
12 Apr 2024
Rho-1: Not All Tokens Are What You Need
Zheng-Wen Lin
Zhibin Gou
Yeyun Gong
Xiao Liu
Yelong Shen
...
Chen Lin
Yujiu Yang
Jian Jiao
Nan Duan
Weizhu Chen
CLL
46
55
0
11 Apr 2024
Robust Pronoun Fidelity with English LLMs: Are they Reasoning, Repeating, or Just Biased?
Vagrant Gautam
Eileen Bingert
D. Zhu
Anne Lauscher
Dietrich Klakow
38
8
0
04 Apr 2024
Understanding Emergent Abilities of Language Models from the Loss Perspective
Zhengxiao Du
Aohan Zeng
Yuxiao Dong
Jie Tang
UQCV
LRM
57
46
0
23 Mar 2024
Language models scale reliably with over-training and on downstream tasks
S. Gadre
Georgios Smyrnis
Vaishaal Shankar
Suchin Gururangan
Mitchell Wortsman
...
Y. Carmon
Achal Dave
Reinhard Heckel
Niklas Muennighoff
Ludwig Schmidt
ALM
ELM
LRM
103
40
0
13 Mar 2024
Algorithmic progress in language models
Anson Ho
T. Besiroglu
Ege Erdil
David Owen
Robi Rahman
Zifan Carl Guo
David Atkinson
Neil Thompson
J. Sevilla
32
16
0
09 Mar 2024
How does Architecture Influence the Base Capabilities of Pre-trained Language Models? A Case Study Based on FFN-Wider Transformer Models
Xin Lu
Yanyan Zhao
Bing Qin
20
0
0
04 Mar 2024
Same Task, More Tokens: the Impact of Input Length on the Reasoning Performance of Large Language Models
Mosh Levy
Alon Jacoby
Yoav Goldberg
32
67
0
19 Feb 2024
Learning Low-Rank Feature for Thorax Disease Classification
Rajeev Goel
Utkarsh Nath
Yancheng Wang
Alvin C. Silva
Teresa Wu
Yingzhen Yang
8
0
0
14 Feb 2024
Investigating Recurrent Transformers with Dynamic Halt
Jishnu Ray Chowdhury
Cornelia Caragea
34
1
0
01 Feb 2024
Beyond Chinchilla-Optimal: Accounting for Inference in Language Model Scaling Laws
Nikhil Sardana
Jacob P. Portes
Sasha Doubov
Jonathan Frankle
LRM
224
65
0
31 Dec 2023
The Efficiency Spectrum of Large Language Models: An Algorithmic Survey
Tianyu Ding
Tianyi Chen
Haidong Zhu
Jiachen Jiang
Yiqi Zhong
Jinxin Zhou
Guangzhi Wang
Zhihui Zhu
Ilya Zharkov
Luming Liang
27
21
0
01 Dec 2023
xTrimoGene: An Efficient and Scalable Representation Learner for Single-Cell RNA-Seq Data
Jing Gong
Minsheng Hao
Xingyi Cheng
Xin Zeng
Chiming Liu
Jianzhu Ma
Xuegong Zhang
Taifeng Wang
Leo T. Song
11
17
0
26 Nov 2023
Alignment is not sufficient to prevent large language models from generating harmful information: A psychoanalytic perspective
Zi Yin
Wei Ding
Jia Liu
9
1
0
14 Nov 2023
Navigating Scaling Laws: Compute Optimality in Adaptive Model Training
Sotiris Anagnostidis
Gregor Bachmann
Imanol Schlag
Thomas Hofmann
23
2
0
06 Nov 2023
KAKURENBO: Adaptively Hiding Samples in Deep Neural Network Training
Truong Thao Nguyen
Balazs Gerofi
Edgar Josafat Martinez-Noriega
Franccois Trahay
M. Wahib
13
0
0
16 Oct 2023
Sparse Universal Transformer
Shawn Tan
Yikang Shen
Zhenfang Chen
Aaron Courville
Chuang Gan
MoE
25
13
0
11 Oct 2023
How Physics and Background Attributes Impact Video Transformers in Robotic Manipulation: A Case Study on Planar Pushing
Shutong Jin
Ruiyu Wang
Muhammad Zahid
Florian T. Pokorny
21
1
0
03 Oct 2023
Ring Attention with Blockwise Transformers for Near-Infinite Context
Hao Liu
Matei A. Zaharia
Pieter Abbeel
23
216
0
03 Oct 2023
The Languini Kitchen: Enabling Language Modelling Research at Different Scales of Compute
Aleksandar Stanić
Dylan R. Ashley
Oleg Serikov
Louis Kirsch
Francesco Faccio
Jürgen Schmidhuber
Thomas Hofmann
Imanol Schlag
MoE
38
9
0
20 Sep 2023
A Hierarchical Neural Framework for Classification and its Explanation in Large Unstructured Legal Documents
Nishchal Prasad
M. Boughanem
Taoufik Dkaki
ELM
AILaw
19
0
0
19 Sep 2023
FusionAI: Decentralized Training and Deploying LLMs with Massive Consumer-Level GPUs
Zhenheng Tang
Yuxin Wang
Xin He
Longteng Zhang
Xinglin Pan
...
Rongfei Zeng
Kaiyong Zhao
S. Shi
Bingsheng He
Xiaowen Chu
23
29
0
03 Sep 2023
Chunk, Align, Select: A Simple Long-sequence Processing Method for Transformers
Jiawen Xie
Pengyu Cheng
Xiao Liang
Yong Dai
Nan Du
32
7
0
25 Aug 2023
Evolution of ESG-focused DLT Research: An NLP Analysis of the Literature
Walter Hernandez Cruz
K. Tylinski
Alastair Moore
Niall Roche
Nikhil Vadgama
Horst Treiblmaier
J. Shangguan
Paolo Tasca
Jiahua Xu
21
2
0
23 Aug 2023
Deep learning-based denoising streamed from mobile phones improves speech-in-noise understanding for hearing aid users
P. U. Diehl
Hannes Zilly
Felix Sattler
Y. Singer
Kevin Kepp
...
Paul Meyer-Rachner
A. Pudszuhn
V. Hofmann
M. Vormann
Elias Sprengel
13
3
0
22 Aug 2023
No Train No Gain: Revisiting Efficient Training Algorithms For Transformer-based Language Models
Jean Kaddour
Oscar Key
Piotr Nawrot
Pasquale Minervini
Matt J. Kusner
13
41
0
12 Jul 2023
The Shaped Transformer: Attention Models in the Infinite Depth-and-Width Limit
Lorenzo Noci
Chuning Li
Mufan Bill Li
Bobby He
Thomas Hofmann
Chris J. Maddison
Daniel M. Roy
13
29
0
30 Jun 2023
Rethink DARTS Search Space and Renovate a New Benchmark
Jiuling Zhang
Zhiming Ding
28
1
0
12 Jun 2023
Blockwise Parallel Transformer for Large Context Models
Hao Liu
Pieter Abbeel
31
11
0
30 May 2023
1
2
Next