Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2203.15556
Cited By
Training Compute-Optimal Large Language Models
29 March 2022
Jordan Hoffmann
Sebastian Borgeaud
A. Mensch
Elena Buchatskaya
Trevor Cai
Eliza Rutherford
Diego de Las Casas
Lisa Anne Hendricks
Johannes Welbl
Aidan Clark
Tom Hennigan
Eric Noland
Katie Millican
George van den Driessche
Bogdan Damoc
Aurelia Guy
Simon Osindero
Karen Simonyan
Erich Elsen
Jack W. Rae
Oriol Vinyals
Laurent Sifre
AI4TS
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Training Compute-Optimal Large Language Models"
50 / 316 papers shown
Title
Adaptive Computation with Elastic Input Sequence
Fuzhao Xue
Valerii Likhosherstov
Anurag Arnab
N. Houlsby
Mostafa Dehghani
Yang You
27
18
0
30 Jan 2023
Call for Papers -- The BabyLM Challenge: Sample-efficient pretraining on a developmentally plausible corpus
Alex Warstadt
Leshem Choshen
Aaron Mueller
Adina Williams
Ethan Gotlieb Wilcox
Chengxu Zhuang
11
54
0
27 Jan 2023
Probing Out-of-Distribution Robustness of Language Models with Parameter-Efficient Transfer Learning
Hyunsoo Cho
Choonghyun Park
Junyeop Kim
Hyuhng Joon Kim
Kang Min Yoo
Sang-goo Lee
OODD
10
3
0
27 Jan 2023
Projected Subnetworks Scale Adaptation
Siddhartha Datta
N. Shadbolt
VLM
CLL
16
0
0
27 Jan 2023
TikTalk: A Video-Based Dialogue Dataset for Multi-Modal Chitchat in Real World
Hongpeng Lin
Ludan Ruan
Wenke Xia
Peiyu Liu
Jing Wen
...
Di Hu
Ruihua Song
Wayne Xin Zhao
Qin Jin
Zhiwu Lu
VGen
27
9
0
14 Jan 2023
Data Distillation: A Survey
Noveen Sachdeva
Julian McAuley
DD
25
73
0
11 Jan 2023
Second Thoughts are Best: Learning to Re-Align With Human Values from Text Edits
Ruibo Liu
Chenyan Jia
Ge Zhang
Ziyu Zhuang
Tony X. Liu
Soroush Vosoughi
75
34
0
01 Jan 2023
Hungry Hungry Hippos: Towards Language Modeling with State Space Models
Daniel Y. Fu
Tri Dao
Khaled Kamal Saab
A. Thomas
Atri Rudra
Christopher Ré
43
367
0
28 Dec 2022
Task Ambiguity in Humans and Language Models
Alex Tamkin
Kunal Handa
Ava Shrestha
Noah D. Goodman
UQLM
24
22
0
20 Dec 2022
Gradient flow in the gaussian covariate model: exact solution of learning curves and multiple descent structures
Antione Bodin
N. Macris
19
4
0
13 Dec 2022
General-Purpose In-Context Learning by Meta-Learning Transformers
Louis Kirsch
James Harrison
Jascha Narain Sohl-Dickstein
Luke Metz
27
72
0
08 Dec 2022
Editing Models with Task Arithmetic
Gabriel Ilharco
Marco Tulio Ribeiro
Mitchell Wortsman
Suchin Gururangan
Ludwig Schmidt
Hannaneh Hajishirzi
Ali Farhadi
KELM
MoMe
MU
39
422
0
08 Dec 2022
ColD Fusion: Collaborative Descent for Distributed Multitask Finetuning
Shachar Don-Yehiya
Elad Venezian
Colin Raffel
Noam Slonim
Yoav Katz
Leshem Choshen
MoMe
16
52
0
02 Dec 2022
A Pipeline for Generating, Annotating and Employing Synthetic Data for Real World Question Answering
Matthew Maufe
James Ravenscroft
Rob Procter
Maria Liakata
24
3
0
30 Nov 2022
Fine-tuning language models to find agreement among humans with diverse preferences
Michiel A. Bakker
Martin Chadwick
Hannah R. Sheahan
Michael Henry Tessler
Lucy Campbell-Gillingham
...
Nat McAleese
Amelia Glaese
John Aslanides
M. Botvinick
Christopher Summerfield
ALM
10
214
0
28 Nov 2022
Understanding BLOOM: An empirical study on diverse NLP tasks
Parag Dakle
Sai Krishna Rallabandi
Preethi Raghavan
AI4CE
31
3
0
27 Nov 2022
Retrieval-Augmented Multimodal Language Modeling
Michihiro Yasunaga
Armen Aghajanyan
Weijia Shi
Rich James
J. Leskovec
Percy Liang
M. Lewis
Luke Zettlemoyer
Wen-tau Yih
RALM
11
95
0
22 Nov 2022
GAMMT: Generative Ambiguity Modeling Using Multiple Transformers
Xingcheng Xu
14
0
0
16 Nov 2022
Efficiently Scaling Transformer Inference
Reiner Pope
Sholto Douglas
Aakanksha Chowdhery
Jacob Devlin
James Bradbury
Anselm Levskaya
Jonathan Heek
Kefan Xiao
Shivani Agrawal
J. Dean
21
292
0
09 Nov 2022
Changes from Classical Statistics to Modern Statistics and Data Science
Kai Zhang
Shan-Yu Liu
M. Xiong
21
0
0
30 Oct 2022
A Solvable Model of Neural Scaling Laws
A. Maloney
Daniel A. Roberts
J. Sully
29
51
0
30 Oct 2022
Precision Machine Learning
Eric J. Michaud
Ziming Liu
Max Tegmark
11
34
0
24 Oct 2022
Composing Ensembles of Pre-trained Models via Iterative Consensus
Shuang Li
Yilun Du
J. Tenenbaum
Antonio Torralba
Igor Mordatch
MoMe
19
23
0
20 Oct 2022
Scaling Instruction-Finetuned Language Models
Hyung Won Chung
Le Hou
Shayne Longpre
Barret Zoph
Yi Tay
...
Jacob Devlin
Adam Roberts
Denny Zhou
Quoc V. Le
Jason W. Wei
ReLM
LRM
56
2,978
0
20 Oct 2022
Transcending Scaling Laws with 0.1% Extra Compute
Yi Tay
Jason W. Wei
Hyung Won Chung
Vinh Q. Tran
David R. So
...
Donald Metzler
Slav Petrov
N. Houlsby
Quoc V. Le
Mostafa Dehghani
LRM
29
68
0
20 Oct 2022
Challenging BIG-Bench Tasks and Whether Chain-of-Thought Can Solve Them
Mirac Suzgun
Nathan Scales
Nathanael Scharli
Sebastian Gehrmann
Yi Tay
...
Aakanksha Chowdhery
Quoc V. Le
Ed H. Chi
Denny Zhou
Jason W. Wei
ALM
ELM
LRM
ReLM
69
988
0
17 Oct 2022
An Exploration of Hierarchical Attention Transformers for Efficient Long Document Classification
Ilias Chalkidis
Xiang Dai
Manos Fergadiotis
Prodromos Malakasiotis
Desmond Elliott
30
33
0
11 Oct 2022
AlphaTuning: Quantization-Aware Parameter-Efficient Adaptation of Large-Scale Pre-Trained Language Models
S. Kwon
Jeonghoon Kim
Jeongin Bae
Kang Min Yoo
Jin-Hwa Kim
Baeseong Park
Byeongwook Kim
Jung-Woo Ha
Nako Sung
Dongsoo Lee
MQ
21
30
0
08 Oct 2022
Unmasking the Lottery Ticket Hypothesis: What's Encoded in a Winning Ticket's Mask?
Mansheej Paul
F. Chen
Brett W. Larsen
Jonathan Frankle
Surya Ganguli
Gintare Karolina Dziugaite
UQCV
17
38
0
06 Oct 2022
Ask Me Anything: A simple strategy for prompting language models
Simran Arora
A. Narayan
Mayee F. Chen
Laurel J. Orr
Neel Guha
Kush S. Bhatia
Ines Chami
Frederic Sala
Christopher Ré
ReLM
LRM
206
205
0
05 Oct 2022
Where Should I Spend My FLOPS? Efficiency Evaluations of Visual Pre-training Methods
Skanda Koppula
Yazhe Li
Evan Shelhamer
Andrew Jaegle
Nikhil Parthasarathy
Relja Arandjelović
João Carreira
Olivier J. Hénaff
23
9
0
30 Sep 2022
Scaling Laws for a Multi-Agent Reinforcement Learning Model
Oren Neumann
C. Gros
24
26
0
29 Sep 2022
Stop Wasting My Time! Saving Days of ImageNet and BERT Training with Latest Weight Averaging
Jean Kaddour
MoMe
3DH
17
39
0
29 Sep 2022
Improving alignment of dialogue agents via targeted human judgements
Amelia Glaese
Nat McAleese
Maja Trkebacz
John Aslanides
Vlad Firoiu
...
John F. J. Mellor
Demis Hassabis
Koray Kavukcuoglu
Lisa Anne Hendricks
G. Irving
ALM
AAML
225
500
0
28 Sep 2022
Variational Open-Domain Question Answering
Valentin Liévin
Andreas Geert Motzfeldt
Ida Riis Jensen
Ole Winther
OOD
BDL
26
8
0
23 Sep 2022
Faithful Reasoning Using Large Language Models
Antonia Creswell
Murray Shanahan
ReLM
LRM
13
120
0
30 Aug 2022
Quality Not Quantity: On the Interaction between Dataset Design and Robustness of CLIP
Thao Nguyen
Gabriel Ilharco
Mitchell Wortsman
Sewoong Oh
Ludwig Schmidt
CLIP
VLM
38
97
0
10 Aug 2022
AlexaTM 20B: Few-Shot Learning Using a Large-Scale Multilingual Seq2Seq Model
Saleh Soltan
Shankar Ananthakrishnan
Jack G. M. FitzGerald
Rahul Gupta
Wael Hamza
...
Mukund Sridhar
Fabian Triefenbach
Apurv Verma
Gökhan Tür
Premkumar Natarajan
34
82
0
02 Aug 2022
Hidden Progress in Deep Learning: SGD Learns Parities Near the Computational Limit
Boaz Barak
Benjamin L. Edelman
Surbhi Goel
Sham Kakade
Eran Malach
Cyril Zhang
25
123
0
18 Jul 2022
Machine Learning Model Sizes and the Parameter Gap
Pablo Villalobos
J. Sevilla
T. Besiroglu
Lennart Heim
A. Ho
Marius Hobbhahn
ALM
ELM
AI4CE
18
56
0
05 Jul 2022
Measuring Forgetting of Memorized Training Examples
Matthew Jagielski
Om Thakkar
Florian Tramèr
Daphne Ippolito
Katherine Lee
...
Eric Wallace
Shuang Song
Abhradeep Thakurta
Nicolas Papernot
Chiyuan Zhang
TDI
40
102
0
30 Jun 2022
Solving Quantitative Reasoning Problems with Language Models
Aitor Lewkowycz
Anders Andreassen
David Dohan
Ethan Dyer
Henryk Michalewski
...
Theo Gutman-Solo
Yuhuai Wu
Behnam Neyshabur
Guy Gur-Ari
Vedant Misra
ReLM
ELM
LRM
36
738
0
29 Jun 2022
Long Range Language Modeling via Gated State Spaces
Harsh Mehta
Ankit Gupta
Ashok Cutkosky
Behnam Neyshabur
Mamba
26
231
0
27 Jun 2022
Scaling Autoregressive Models for Content-Rich Text-to-Image Generation
Jiahui Yu
Yuanzhong Xu
Jing Yu Koh
Thang Luong
Gunjan Baid
...
Zarana Parekh
Xin Li
Han Zhang
Jason Baldridge
Yonghui Wu
EGVM
88
1,061
0
22 Jun 2022
PlanBench: An Extensible Benchmark for Evaluating Large Language Models on Planning and Reasoning about Change
Karthik Valmeekam
Matthew Marquez
Alberto Olmo
S. Sreedharan
Subbarao Kambhampati
ReLM
LRM
11
197
0
21 Jun 2022
Towards Understanding How Machines Can Learn Causal Overhypotheses
Eliza Kosoy
David M. Chan
Adrian Liu
Jasmine Collins
Bryanna Kaufmann
Sandy Han Huang
Jessica B. Hamrick
John F. Canny
Nan Rosemary Ke
Alison Gopnik
CML
AI4CE
18
18
0
16 Jun 2022
Zero-Shot Video Question Answering via Frozen Bidirectional Language Models
Antoine Yang
Antoine Miech
Josef Sivic
Ivan Laptev
Cordelia Schmid
34
226
0
16 Jun 2022
Language Models are General-Purpose Interfaces
Y. Hao
Haoyu Song
Li Dong
Shaohan Huang
Zewen Chi
Wenhui Wang
Shuming Ma
Furu Wei
MLLM
19
95
0
13 Jun 2022
A Generalist Agent
Scott E. Reed
Konrad Zolna
Emilio Parisotto
Sergio Gomez Colmenarejo
Alexander Novikov
...
Yutian Chen
R. Hadsell
Oriol Vinyals
Mahyar Bordbar
Nando de Freitas
LM&Ro
LLMAG
AI4CE
46
783
0
12 May 2022
Adversarial Training for High-Stakes Reliability
Daniel M. Ziegler
Seraphina Nix
Lawrence Chan
Tim Bauman
Peter Schmidt-Nielsen
...
Noa Nabeshima
Benjamin Weinstein-Raun
D. Haas
Buck Shlegeris
Nate Thomas
AAML
22
59
0
03 May 2022
Previous
1
2
3
4
5
6
7
Next