Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2404.07647
Cited By
Why do small language models underperform? Studying Language Model Saturation via the Softmax Bottleneck
11 April 2024
Nathan Godey
Eric Villemonte de la Clergerie
Benoît Sagot
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Why do small language models underperform? Studying Language Model Saturation via the Softmax Bottleneck"
4 / 4 papers shown
Title
Small Language Models: Survey, Measurements, and Insights
Zhenyan Lu
Xiang Li
Dongqi Cai
Rongjie Yi
Fangming Liu
Xiwen Zhang
Nicholas D. Lane
Mengwei Xu
ObjD
LRM
47
31
0
24 Sep 2024
Outliers Dimensions that Disrupt Transformers Are Driven by Frequency
Giovanni Puccetti
Anna Rogers
Aleksandr Drozd
F. Dell’Orletta
63
42
0
23 May 2022
The Pile: An 800GB Dataset of Diverse Text for Language Modeling
Leo Gao
Stella Biderman
Sid Black
Laurence Golding
Travis Hoppe
...
Horace He
Anish Thite
Noa Nabeshima
Shawn Presser
Connor Leahy
AIMat
239
1,508
0
31 Dec 2020
Scaling Laws for Neural Language Models
Jared Kaplan
Sam McCandlish
T. Henighan
Tom B. Brown
B. Chess
R. Child
Scott Gray
Alec Radford
Jeff Wu
Dario Amodei
220
3,054
0
23 Jan 2020
1