Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2309.11197
Cited By
The Languini Kitchen: Enabling Language Modelling Research at Different Scales of Compute
20 September 2023
Aleksandar Stanić
Dylan R. Ashley
Oleg Serikov
Louis Kirsch
Francesco Faccio
Jürgen Schmidhuber
Thomas Hofmann
Imanol Schlag
MoE
Re-assign community
ArXiv
PDF
HTML
Papers citing
"The Languini Kitchen: Enabling Language Modelling Research at Different Scales of Compute"
12 / 12 papers shown
Title
ReLU's Revival: On the Entropic Overload in Normalization-Free Large Language Models
N. Jha
Brandon Reagen
OffRL
AI4CE
15
0
0
12 Oct 2024
Understanding and Minimising Outlier Features in Neural Network Training
Bobby He
Lorenzo Noci
Daniele Paliotta
Imanol Schlag
Thomas Hofmann
19
3
0
29 May 2024
Understanding the differences in Foundation Models: Attention, State Space Models, and Recurrent Neural Networks
Jerome Sieber
Carmen Amo Alonso
A. Didier
M. Zeilinger
Antonio Orvieto
AAML
39
7
0
24 May 2024
Large Language Model Programs
Imanol Schlag
Sainbayar Sukhbaatar
Asli Celikyilmaz
Wen-tau Yih
Jason Weston
Jürgen Schmidhuber
Xian Li
LRM
24
14
0
09 May 2023
Text-to-Audio Generation using Instruction-Tuned LLM and Latent Diffusion Model
Deepanway Ghosal
Navonil Majumder
Ambuj Mehrish
Soujanya Poria
135
137
0
24 Apr 2023
Your Transformer May Not be as Powerful as You Expect
Shengjie Luo
Shanda Li
Shuxin Zheng
Tie-Yan Liu
Liwei Wang
Di He
52
50
0
26 May 2022
SCENIC: A JAX Library for Computer Vision Research and Beyond
Mostafa Dehghani
A. Gritsenko
Anurag Arnab
Matthias Minderer
Yi Tay
38
67
0
18 Oct 2021
Train Short, Test Long: Attention with Linear Biases Enables Input Length Extrapolation
Ofir Press
Noah A. Smith
M. Lewis
234
690
0
27 Aug 2021
The Pile: An 800GB Dataset of Diverse Text for Language Modeling
Leo Gao
Stella Biderman
Sid Black
Laurence Golding
Travis Hoppe
...
Horace He
Anish Thite
Noa Nabeshima
Shawn Presser
Connor Leahy
AIMat
236
1,508
0
31 Dec 2020
Meta Learning Backpropagation And Improving It
Louis Kirsch
Jürgen Schmidhuber
34
53
0
29 Dec 2020
Scaling Laws for Neural Language Models
Jared Kaplan
Sam McCandlish
T. Henighan
Tom B. Brown
B. Chess
R. Child
Scott Gray
Alec Radford
Jeff Wu
Dario Amodei
220
3,054
0
23 Jan 2020
Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism
M. Shoeybi
M. Patwary
Raul Puri
P. LeGresley
Jared Casper
Bryan Catanzaro
MoE
243
1,791
0
17 Sep 2019
1