ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2203.15556
  4. Cited By
Training Compute-Optimal Large Language Models

Training Compute-Optimal Large Language Models

29 March 2022
Jordan Hoffmann
Sebastian Borgeaud
A. Mensch
Elena Buchatskaya
Trevor Cai
Eliza Rutherford
Diego de Las Casas
Lisa Anne Hendricks
Johannes Welbl
Aidan Clark
Tom Hennigan
Eric Noland
Katie Millican
George van den Driessche
Bogdan Damoc
Aurelia Guy
Simon Osindero
Karen Simonyan
Erich Elsen
Jack W. Rae
Oriol Vinyals
Laurent Sifre
    AI4TS
ArXivPDFHTML

Papers citing "Training Compute-Optimal Large Language Models"

50 / 316 papers shown
Title
Data Mixing Laws: Optimizing Data Mixtures by Predicting Language Modeling Performance
Data Mixing Laws: Optimizing Data Mixtures by Predicting Language Modeling Performance
Jiasheng Ye
Peiju Liu
Tianxiang Sun
Yunhua Zhou
Jun Zhan
Xipeng Qiu
37
60
0
25 Mar 2024
Understanding Emergent Abilities of Language Models from the Loss Perspective
Understanding Emergent Abilities of Language Models from the Loss Perspective
Zhengxiao Du
Aohan Zeng
Yuxiao Dong
Jie Tang
UQCV
LRM
62
46
0
23 Mar 2024
Synth$^2$: Boosting Visual-Language Models with Synthetic Captions and
  Image Embeddings
Synth2^22: Boosting Visual-Language Models with Synthetic Captions and Image Embeddings
Sahand Sharifzadeh
Christos Kaplanis
Shreya Pathak
D. Kumaran
Anastasija Ilić
Jovana Mitrović
Charles Blundell
Andrea Banino
VLM
32
9
0
12 Mar 2024
Yi: Open Foundation Models by 01.AI
Yi: Open Foundation Models by 01.AI
01. AI
Alex Young
01.AI Alex Young
Bei Chen
Chao Li
...
Yue Wang
Yuxuan Cai
Zhenyu Gu
Zhiyuan Liu
Zonghong Dai
OSLM
LRM
121
495
0
07 Mar 2024
Do Large Language Models Mirror Cognitive Language Processing?
Do Large Language Models Mirror Cognitive Language Processing?
Yuqi Ren
Renren Jin
Tongxuan Zhang
Deyi Xiong
36
4
0
28 Feb 2024
VL-Trojan: Multimodal Instruction Backdoor Attacks against
  Autoregressive Visual Language Models
VL-Trojan: Multimodal Instruction Backdoor Attacks against Autoregressive Visual Language Models
Jiawei Liang
Siyuan Liang
Man Luo
Aishan Liu
Dongchen Han
Ee-Chien Chang
Xiaochun Cao
38
37
0
21 Feb 2024
Large Language Models: A Survey
Large Language Models: A Survey
Shervin Minaee
Tomáš Mikolov
Narjes Nikzad
M. Asgari-Chenaghlu
R. Socher
Xavier Amatriain
Jianfeng Gao
ALM
LM&MA
ELM
120
353
0
09 Feb 2024
Pretrained Generative Language Models as General Learning Frameworks for
  Sequence-Based Tasks
Pretrained Generative Language Models as General Learning Frameworks for Sequence-Based Tasks
Ben Fauber
11
2
0
08 Feb 2024
Learning Universal Predictors
Learning Universal Predictors
Jordi Grau-Moya
Tim Genewein
Marcus Hutter
Laurent Orseau
Grégoire Delétang
...
Anian Ruoss
Wenliang Kevin Li
Christopher Mattern
Matthew Aitchison
J. Veness
19
11
0
26 Jan 2024
Medusa: Simple LLM Inference Acceleration Framework with Multiple
  Decoding Heads
Medusa: Simple LLM Inference Acceleration Framework with Multiple Decoding Heads
Tianle Cai
Yuhong Li
Zhengyang Geng
Hongwu Peng
Jason D. Lee
De-huai Chen
Tri Dao
29
246
0
19 Jan 2024
DoraemonGPT: Toward Understanding Dynamic Scenes with Large Language Models (Exemplified as A Video Agent)
DoraemonGPT: Toward Understanding Dynamic Scenes with Large Language Models (Exemplified as A Video Agent)
Zongxin Yang
Guikun Chen
Xiaodi Li
Wenguan Wang
Yi Yang
LM&Ro
LLMAG
48
35
0
16 Jan 2024
LLMs Perform Poorly at Concept Extraction in Cyber-security Research
  Literature
LLMs Perform Poorly at Concept Extraction in Cyber-security Research Literature
Maxime Wursch
Andrei Kucharavy
Dimitri Percia David
Alain Mermoud
6
4
0
12 Dec 2023
Compositional Capabilities of Autoregressive Transformers: A Study on
  Synthetic, Interpretable Tasks
Compositional Capabilities of Autoregressive Transformers: A Study on Synthetic, Interpretable Tasks
Rahul Ramesh
Ekdeep Singh Lubana
Mikail Khona
Robert P. Dick
Hidenori Tanaka
CoGe
27
6
0
21 Nov 2023
When Is Multilinguality a Curse? Language Modeling for 250 High- and
  Low-Resource Languages
When Is Multilinguality a Curse? Language Modeling for 250 High- and Low-Resource Languages
Tyler A. Chang
Catherine Arnett
Zhuowen Tu
Benjamin Bergen
LRM
20
7
0
15 Nov 2023
OpenForest: A data catalogue for machine learning in forest monitoring
OpenForest: A data catalogue for machine learning in forest monitoring
Arthur Ouaknine
T. Kattenborn
Etienne Laliberté
David Rolnick
41
5
0
01 Nov 2023
Enhancing Zero-Shot Crypto Sentiment with Fine-tuned Language Model and
  Prompt Engineering
Enhancing Zero-Shot Crypto Sentiment with Fine-tuned Language Model and Prompt Engineering
Rahman S. M. Wahidur
Ishmam Tashdeed
Manjit Kaur
Heung-No Lee
ALM
25
17
0
20 Oct 2023
VLIS: Unimodal Language Models Guide Multimodal Language Generation
VLIS: Unimodal Language Models Guide Multimodal Language Generation
Jiwan Chung
Youngjae Yu
VLM
22
1
0
15 Oct 2023
InstructRetro: Instruction Tuning post Retrieval-Augmented Pretraining
InstructRetro: Instruction Tuning post Retrieval-Augmented Pretraining
Boxin Wang
Wei Ping
Lawrence C. McAfee
Peng-Tao Xu
Bo Li
M. Shoeybi
Bryan Catanzaro
RALM
16
45
0
11 Oct 2023
FTFT: Efficient and Robust Fine-Tuning by Transferring Training Dynamics
FTFT: Efficient and Robust Fine-Tuning by Transferring Training Dynamics
Yupei Du
Albert Gatt
Dong Nguyen
19
1
0
10 Oct 2023
Divide-and-Conquer Dynamics in AI-Driven Disempowerment
Divide-and-Conquer Dynamics in AI-Driven Disempowerment
Peter S. Park
Max Tegmark
13
1
0
09 Oct 2023
Transformer Fusion with Optimal Transport
Transformer Fusion with Optimal Transport
Moritz Imfeld
Jacopo Graldi
Marco Giordano
Thomas Hofmann
Sotiris Anagnostidis
Sidak Pal Singh
ViT
MoMe
22
16
0
09 Oct 2023
Recurrent Neural Language Models as Probabilistic Finite-state Automata
Recurrent Neural Language Models as Probabilistic Finite-state Automata
Anej Svete
Ryan Cotterell
28
2
0
08 Oct 2023
A Benchmark for Learning to Translate a New Language from One Grammar
  Book
A Benchmark for Learning to Translate a New Language from One Grammar Book
Garrett Tanzer
Mirac Suzgun
Chenguang Xi
Dan Jurafsky
Luke Melas-Kyriazi
24
51
0
28 Sep 2023
Small-scale proxies for large-scale Transformer training instabilities
Small-scale proxies for large-scale Transformer training instabilities
Mitchell Wortsman
Peter J. Liu
Lechao Xiao
Katie Everett
A. Alemi
...
Jascha Narain Sohl-Dickstein
Kelvin Xu
Jaehoon Lee
Justin Gilmer
Simon Kornblith
35
80
0
25 Sep 2023
LLMCarbon: Modeling the end-to-end Carbon Footprint of Large Language
  Models
LLMCarbon: Modeling the end-to-end Carbon Footprint of Large Language Models
Ahmad Faiz
S. Kaneda
Ruhan Wang
Rita Osi
Parteek Sharma
Fan Chen
Lei Jiang
23
56
0
25 Sep 2023
OpenChat: Advancing Open-source Language Models with Mixed-Quality Data
OpenChat: Advancing Open-source Language Models with Mixed-Quality Data
Guan-Bo Wang
Sijie Cheng
Xianyuan Zhan
Xiangang Li
Sen Song
Yang Liu
ALM
13
227
0
20 Sep 2023
Language Modeling Is Compression
Language Modeling Is Compression
Grégoire Delétang
Anian Ruoss
Paul-Ambroise Duquenne
Elliot Catt
Tim Genewein
...
Wenliang Kevin Li
Matthew Aitchison
Laurent Orseau
Marcus Hutter
J. Veness
AI4CE
30
129
0
19 Sep 2023
Pareto Frontiers in Neural Feature Learning: Data, Compute, Width, and
  Luck
Pareto Frontiers in Neural Feature Learning: Data, Compute, Width, and Luck
Benjamin L. Edelman
Surbhi Goel
Sham Kakade
Eran Malach
Cyril Zhang
43
8
0
07 Sep 2023
GPT Can Solve Mathematical Problems Without a Calculator
GPT Can Solve Mathematical Problems Without a Calculator
Z. Yang
Ming Ding
Qingsong Lv
Zhihuan Jiang
Zehai He
Yuyi Guo
Jinfeng Bai
Jie Tang
RALM
LRM
26
52
0
06 Sep 2023
Spoken Language Intelligence of Large Language Models for Language Learning
Spoken Language Intelligence of Large Language Models for Language Learning
Linkai Peng
Baorian Nuchged
Yingming Gao
ELM
57
4
0
28 Aug 2023
AI-Generated Content (AIGC) for Various Data Modalities: A Survey
AI-Generated Content (AIGC) for Various Data Modalities: A Survey
Lin Geng Foo
Hossein Rahmani
J. Liu
62
31
0
27 Aug 2023
MedAlign: A Clinician-Generated Dataset for Instruction Following with
  Electronic Medical Records
MedAlign: A Clinician-Generated Dataset for Instruction Following with Electronic Medical Records
Scott L. Fleming
Alejandro Lozano
W. Haberkorn
Jenelle A. Jindal
E. Reis
...
Jonathan H. Chen
Keith Morse
Emma Brunskill
Jason Alan Fries
N. Shah
LM&MA
28
53
0
27 Aug 2023
kTrans: Knowledge-Aware Transformer for Binary Code Embedding
kTrans: Knowledge-Aware Transformer for Binary Code Embedding
Wenyu Zhu
Hao Wang
Yuchen Zhou
Jiaming Wang
Zihan Sha
Zeyu Gao
Chao Zhang
17
10
0
24 Aug 2023
Evolution of ESG-focused DLT Research: An NLP Analysis of the Literature
Evolution of ESG-focused DLT Research: An NLP Analysis of the Literature
Walter Hernandez Cruz
K. Tylinski
Alastair Moore
Niall Roche
Nikhil Vadgama
Horst Treiblmaier
J. Shangguan
Paolo Tasca
Jiahua Xu
21
2
0
23 Aug 2023
Diffusion Language Models Can Perform Many Tasks with Scaling and Instruction-Finetuning
Diffusion Language Models Can Perform Many Tasks with Scaling and Instruction-Finetuning
Jiasheng Ye
Zaixiang Zheng
Yu Bao
Lihua Qian
Quanquan Gu
DiffM
52
14
0
23 Aug 2023
SeqGPT: An Out-of-the-box Large Language Model for Open Domain Sequence
  Understanding
SeqGPT: An Out-of-the-box Large Language Model for Open Domain Sequence Understanding
Tianyu Yu
Chengyue Jiang
Chao Lou
Shen Huang
Xiaobin Wang
...
Haitao Zheng
Ningyu Zhang
Pengjun Xie
Fei Huang
Yong-jia Jiang
LRM
46
17
0
21 Aug 2023
LLM4TS: Aligning Pre-Trained LLMs as Data-Efficient Time-Series
  Forecasters
LLM4TS: Aligning Pre-Trained LLMs as Data-Efficient Time-Series Forecasters
Ching Chang
Wei-Yao Wang
Wenjie Peng
Tien-Fu Chen
AI4TS
30
45
0
16 Aug 2023
The Costly Dilemma: Generalization, Evaluation and Cost-Optimal
  Deployment of Large Language Models
The Costly Dilemma: Generalization, Evaluation and Cost-Optimal Deployment of Large Language Models
Abi Aryan
Aakash Kumar Nain
Andrew McMahon
Lucas Augusto Meyer
Harpreet Sahota
22
6
0
15 Aug 2023
Composable Function-preserving Expansions for Transformer Architectures
Composable Function-preserving Expansions for Transformer Architectures
Andrea Gesmundo
Kaitlin Maile
AI4CE
32
8
0
11 Aug 2023
Continual Pre-Training of Large Language Models: How to (re)warm your
  model?
Continual Pre-Training of Large Language Models: How to (re)warm your model?
Kshitij Gupta
Benjamin Thérien
Adam Ibrahim
Mats L. Richter
Quentin G. Anthony
Eugene Belilovsky
Irina Rish
Timothée Lesort
KELM
22
99
0
08 Aug 2023
From Sparse to Soft Mixtures of Experts
From Sparse to Soft Mixtures of Experts
J. Puigcerver
C. Riquelme
Basil Mustafa
N. Houlsby
MoE
121
114
0
02 Aug 2023
Scaling Sentence Embeddings with Large Language Models
Scaling Sentence Embeddings with Large Language Models
Ting Jiang
Shaohan Huang
Zhongzhi Luan
Deqing Wang
Fuzhen Zhuang
LRM
34
40
0
31 Jul 2023
Foundational Models Defining a New Era in Vision: A Survey and Outlook
Foundational Models Defining a New Era in Vision: A Survey and Outlook
Muhammad Awais
Muzammal Naseer
Salman Khan
Rao Muhammad Anwer
Hisham Cholakkal
M. Shah
Ming Yang
F. Khan
VLM
18
117
0
25 Jul 2023
Opinion Mining Using Population-tuned Generative Language Models
Opinion Mining Using Population-tuned Generative Language Models
Allmin Pradhap Singh Susaiyah
Abhinay Pandya
Aki Härmä
8
0
0
24 Jul 2023
In-Context Learning Learns Label Relationships but Is Not Conventional
  Learning
In-Context Learning Learns Label Relationships but Is Not Conventional Learning
Jannik Kossen
Y. Gal
Tom Rainforth
32
27
0
23 Jul 2023
Llama 2: Open Foundation and Fine-Tuned Chat Models
Llama 2: Open Foundation and Fine-Tuned Chat Models
Hugo Touvron
Louis Martin
Kevin R. Stone
Peter Albert
Amjad Almahairi
...
Sharan Narang
Aurelien Rodriguez
Robert Stojnic
Sergey Edunov
Thomas Scialom
AI4MH
ALM
88
10,947
0
18 Jul 2023
Mini-Giants: "Small" Language Models and Open Source Win-Win
Mini-Giants: "Small" Language Models and Open Source Win-Win
Zhengping Zhou
Lezhi Li
Xinxi Chen
Andy Li
SyDa
ALM
MoE
24
6
0
17 Jul 2023
No Train No Gain: Revisiting Efficient Training Algorithms For
  Transformer-based Language Models
No Train No Gain: Revisiting Efficient Training Algorithms For Transformer-based Language Models
Jean Kaddour
Oscar Key
Piotr Nawrot
Pasquale Minervini
Matt J. Kusner
13
41
0
12 Jul 2023
Continual Learning as Computationally Constrained Reinforcement Learning
Continual Learning as Computationally Constrained Reinforcement Learning
Saurabh Kumar
Henrik Marklund
Anand Srinivasa Rao
Yifan Zhu
Hong Jun Jeon
Yueyang Liu
Benjamin Van Roy
CLL
27
22
0
10 Jul 2023
JourneyDB: A Benchmark for Generative Image Understanding
JourneyDB: A Benchmark for Generative Image Understanding
Keqiang Sun
Junting Pan
Yuying Ge
Hao Li
Haodong Duan
...
Yi Wang
Jifeng Dai
Yu Qiao
Limin Wang
Hongsheng Li
31
101
0
03 Jul 2023
Previous
1234567
Next