ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2203.15556
  4. Cited By
Training Compute-Optimal Large Language Models

Training Compute-Optimal Large Language Models

29 March 2022
Jordan Hoffmann
Sebastian Borgeaud
A. Mensch
Elena Buchatskaya
Trevor Cai
Eliza Rutherford
Diego de Las Casas
Lisa Anne Hendricks
Johannes Welbl
Aidan Clark
Tom Hennigan
Eric Noland
Katie Millican
George van den Driessche
Bogdan Damoc
Aurelia Guy
Simon Osindero
Karen Simonyan
Erich Elsen
Jack W. Rae
Oriol Vinyals
Laurent Sifre
    AI4TS
ArXivPDFHTML

Papers citing "Training Compute-Optimal Large Language Models"

50 / 316 papers shown
Title
Personality Traits in Large Language Models
Personality Traits in Large Language Models
Gregory Serapio-García
Mustafa Safdari
Clément Crepy
Luning Sun
Stephen Fitz
P. Romero
Marwa Abdulhai
Aleksandra Faust
Maja J. Matarić
LM&MA
LLMAG
53
118
0
01 Jul 2023
Neural Algorithmic Reasoning Without Intermediate Supervision
Neural Algorithmic Reasoning Without Intermediate Supervision
Gleb Rodionov
Liudmila Prokhorenkova
OffRL
LRM
OOD
22
10
0
23 Jun 2023
A Simple and Effective Pruning Approach for Large Language Models
A Simple and Effective Pruning Approach for Large Language Models
Mingjie Sun
Zhuang Liu
Anna Bair
J. Zico Kolter
50
353
0
20 Jun 2023
Large-scale Language Model Rescoring on Long-form Data
Large-scale Language Model Rescoring on Long-form Data
Tongzhou Chen
Cyril Allauzen
Yinghui Huang
Daniel S. Park
David Rybach
...
Rodrigo Cabrera
Kartik Audhkhasi
Bhuvana Ramabhadran
Pedro J. Moreno
Michael Riley
12
14
0
13 Jun 2023
Valley: Video Assistant with Large Language model Enhanced abilitY
Valley: Video Assistant with Large Language model Enhanced abilitY
Ruipu Luo
Ziwang Zhao
Min Yang
Junwei Dong
Da Li
Pengcheng Lu
Tao Wang
Linmei Hu
Ming-Hui Qiu
MLLM
40
188
0
12 Jun 2023
An Empirical Analysis of Parameter-Efficient Methods for Debiasing
  Pre-Trained Language Models
An Empirical Analysis of Parameter-Efficient Methods for Debiasing Pre-Trained Language Models
Zhongbin Xie
Thomas Lukasiewicz
19
12
0
06 Jun 2023
LLM-QAT: Data-Free Quantization Aware Training for Large Language Models
LLM-QAT: Data-Free Quantization Aware Training for Large Language Models
Zechun Liu
Barlas Oğuz
Changsheng Zhao
Ernie Chang
Pierre Stock
Yashar Mehdad
Yangyang Shi
Raghuraman Krishnamoorthi
Vikas Chandra
MQ
42
187
0
29 May 2023
Parameter-Efficient Fine-Tuning without Introducing New Latency
Parameter-Efficient Fine-Tuning without Introducing New Latency
Baohao Liao
Yan Meng
Christof Monz
13
48
0
26 May 2023
Passive learning of active causal strategies in agents and language
  models
Passive learning of active causal strategies in agents and language models
Andrew Kyle Lampinen
Stephanie C. Y. Chan
Ishita Dasgupta
A. Nam
Jane X. Wang
27
15
0
25 May 2023
Dynamic Context Pruning for Efficient and Interpretable Autoregressive
  Transformers
Dynamic Context Pruning for Efficient and Interpretable Autoregressive Transformers
Sotiris Anagnostidis
Dario Pavllo
Luca Biggio
Lorenzo Noci
Aurélien Lucchi
Thomas Hofmann
32
53
0
25 May 2023
Large Language Models for User Interest Journeys
Large Language Models for User Interest Journeys
Konstantina Christakopoulou
Alberto Lalama
Cj Adams
Iris Qu
Yifat Amir
...
Dina Bseiso
Sarah Scodel
Lucas Dixon
Ed H. Chi
Minmin Chen
16
25
0
24 May 2023
In-Context Impersonation Reveals Large Language Models' Strengths and
  Biases
In-Context Impersonation Reveals Large Language Models' Strengths and Biases
Leonard Salewski
Stephan Alaniz
Isabel Rio-Torto
Eric Schulz
Zeynep Akata
22
148
0
24 May 2023
Memory-Efficient Fine-Tuning of Compressed Large Language Models via
  sub-4-bit Integer Quantization
Memory-Efficient Fine-Tuning of Compressed Large Language Models via sub-4-bit Integer Quantization
Jeonghoon Kim
J. H. Lee
Sungdong Kim
Joonsuk Park
Kang Min Yoo
S. Kwon
Dongsoo Lee
MQ
36
97
0
23 May 2023
Response Length Perception and Sequence Scheduling: An LLM-Empowered LLM
  Inference Pipeline
Response Length Perception and Sequence Scheduling: An LLM-Empowered LLM Inference Pipeline
Zangwei Zheng
Xiaozhe Ren
Fuzhao Xue
Yang Luo
Xin Jiang
Yang You
24
54
0
22 May 2023
RWKV: Reinventing RNNs for the Transformer Era
RWKV: Reinventing RNNs for the Transformer Era
Bo Peng
Eric Alcaide
Quentin G. Anthony
Alon Albalak
Samuel Arcadinho
...
Qihang Zhao
P. Zhou
Qinghua Zhou
Jian Zhu
Rui-Jie Zhu
72
554
0
22 May 2023
Rethinking Semi-supervised Learning with Language Models
Rethinking Semi-supervised Learning with Language Models
Zhengxiang Shi
Francesco Tonolini
Nikolaos Aletras
Emine Yilmaz
G. Kazai
Yunlong Jiao
27
17
0
22 May 2023
PaLM 2 Technical Report
PaLM 2 Technical Report
Rohan Anil
Andrew M. Dai
Orhan Firat
Melvin Johnson
Dmitry Lepikhin
...
Ce Zheng
Wei Zhou
Denny Zhou
Slav Petrov
Yonghui Wu
ReLM
LRM
58
1,138
0
17 May 2023
Bridging the Domain Gap: Self-Supervised 3D Scene Understanding with
  Foundation Models
Bridging the Domain Gap: Self-Supervised 3D Scene Understanding with Foundation Models
Zhimin Chen
Longlong Jing
Yingwei Li
Bing Li
24
31
0
15 May 2023
Improving Small Language Models on PubMedQA via Generative Data
  Augmentation
Improving Small Language Models on PubMedQA via Generative Data Augmentation
Zhen Guo
Peiqi Wang
Yanwei Wang
Shangdi Yu
LM&MA
MedIm
18
10
0
12 May 2023
On the average-case complexity of learning output distributions of
  quantum circuits
On the average-case complexity of learning output distributions of quantum circuits
A. Nietner
M. Ioannou
R. Sweke
R. Kueng
Jens Eisert
M. Hinsche
J. Haferkamp
21
11
0
09 May 2023
When and What to Ask Through World States and Text Instructions: IGLU
  NLP Challenge Solution
When and What to Ask Through World States and Text Instructions: IGLU NLP Challenge Solution
Zhengxiang Shi
Jerome Ramos
To Eun Kim
Xi Wang
Hossein A. Rahmani
Aldo Lipani
14
10
0
09 May 2023
SUR-adapter: Enhancing Text-to-Image Pre-trained Diffusion Models with
  Large Language Models
SUR-adapter: Enhancing Text-to-Image Pre-trained Diffusion Models with Large Language Models
Shan Zhong
Zhongzhan Huang
Wushao Wen
Jinghui Qin
Liang Lin
19
40
0
09 May 2023
MoT: Memory-of-Thought Enables ChatGPT to Self-Improve
MoT: Memory-of-Thought Enables ChatGPT to Self-Improve
Xiaonan Li
Xipeng Qiu
ReLM
KELM
LRM
AI4MH
14
32
0
09 May 2023
The Current State of Summarization
The Current State of Summarization
Fabian Retkowski
23
6
0
08 May 2023
How Do In-Context Examples Affect Compositional Generalization?
How Do In-Context Examples Affect Compositional Generalization?
Shengnan An
Zeqi Lin
Qiang Fu
B. Chen
Nanning Zheng
Jian-Guang Lou
Dongmei Zhang
25
49
0
08 May 2023
VPGTrans: Transfer Visual Prompt Generator across LLMs
VPGTrans: Transfer Visual Prompt Generator across LLMs
Ao Zhang
Hao Fei
Yuan Yao
Wei Ji
Li Li
Zhiyuan Liu
Tat-Seng Chua
MLLM
VLM
27
85
0
02 May 2023
Learning to Compress Prompts with Gist Tokens
Learning to Compress Prompts with Gist Tokens
Jesse Mu
Xiang Lisa Li
Noah D. Goodman
VLM
28
204
0
17 Apr 2023
The MiniPile Challenge for Data-Efficient Language Models
The MiniPile Challenge for Data-Efficient Language Models
Jean Kaddour
MoE
ALM
24
41
0
17 Apr 2023
STen: Productive and Efficient Sparsity in PyTorch
STen: Productive and Efficient Sparsity in PyTorch
Andrei Ivanov
Nikoli Dryden
Tal Ben-Nun
Saleh Ashkboos
Torsten Hoefler
30
4
0
15 Apr 2023
DINOv2: Learning Robust Visual Features without Supervision
DINOv2: Learning Robust Visual Features without Supervision
Maxime Oquab
Timothée Darcet
Théo Moutakanni
Huy Q. Vo
Marc Szafraniec
...
Hervé Jégou
Julien Mairal
Patrick Labatut
Armand Joulin
Piotr Bojanowski
VLM
CLIP
SSL
49
3,011
0
14 Apr 2023
On the Opportunities and Challenges of Foundation Models for Geospatial
  Artificial Intelligence
On the Opportunities and Challenges of Foundation Models for Geospatial Artificial Intelligence
Gengchen Mai
Weiming Huang
Jin Sun
Suhang Song
Deepak Mishra
...
Yingjie Hu
Chris Cundy
Ziyuan Li
Rui Zhu
Ni Lao
AI4CE
22
118
0
13 Apr 2023
RAFT: Reward rAnked FineTuning for Generative Foundation Model Alignment
RAFT: Reward rAnked FineTuning for Generative Foundation Model Alignment
Hanze Dong
Wei Xiong
Deepanshu Goyal
Yihan Zhang
Winnie Chow
Rui Pan
Shizhe Diao
Jipeng Zhang
Kashun Shum
Tong Zhang
ALM
11
399
0
13 Apr 2023
Human-machine cooperation for semantic feature listing
Human-machine cooperation for semantic feature listing
Kushin Mukherjee
Siddharth Suresh
Timothy Rogers
VLM
9
1
0
11 Apr 2023
Dynamics of Finite Width Kernel and Prediction Fluctuations in Mean
  Field Neural Networks
Dynamics of Finite Width Kernel and Prediction Fluctuations in Mean Field Neural Networks
Blake Bordelon
C. Pehlevan
MLT
35
29
0
06 Apr 2023
On the Pareto Front of Multilingual Neural Machine Translation
On the Pareto Front of Multilingual Neural Machine Translation
Liang Chen
Shuming Ma
Dongdong Zhang
Furu Wei
Baobao Chang
MoE
18
5
0
06 Apr 2023
Inductive biases in deep learning models for weather prediction
Inductive biases in deep learning models for weather prediction
Jannik Thümmel
Matthias Karlbauer
S. Otte
C. Zarfl
Georg Martius
...
Thomas Scholten
Ulrich Friedrich
V. Wulfmeyer
B. Goswami
Martin Volker Butz
AI4CE
31
4
0
06 Apr 2023
Graph Mixture of Experts: Learning on Large-Scale Graphs with Explicit
  Diversity Modeling
Graph Mixture of Experts: Learning on Large-Scale Graphs with Explicit Diversity Modeling
Haotao Wang
Ziyu Jiang
Yuning You
Yan Han
Gaowen Liu
Jayanth Srinivasa
Ramana Rao Kompella
Zhangyang Wang
16
27
0
06 Apr 2023
On the Creativity of Large Language Models
On the Creativity of Large Language Models
Giorgio Franceschelli
Mirco Musolesi
64
49
0
27 Mar 2023
Scaling Expert Language Models with Unsupervised Domain Discovery
Scaling Expert Language Models with Unsupervised Domain Discovery
Suchin Gururangan
Margaret Li
M. Lewis
Weijia Shi
Tim Althoff
Noah A. Smith
Luke Zettlemoyer
MoE
15
46
0
24 Mar 2023
A Comprehensive Survey of AI-Generated Content (AIGC): A History of
  Generative AI from GAN to ChatGPT
A Comprehensive Survey of AI-Generated Content (AIGC): A History of Generative AI from GAN to ChatGPT
Yihan Cao
Siyu Li
Yixin Liu
Zhiling Yan
Yutong Dai
Philip S. Yu
Lichao Sun
24
501
0
07 Mar 2023
The Contribution of Knowledge in Visiolinguistic Learning: A Survey on
  Tasks and Challenges
The Contribution of Knowledge in Visiolinguistic Learning: A Survey on Tasks and Challenges
Maria Lymperaiou
Giorgos Stamou
VLM
24
4
0
04 Mar 2023
Poisoning Web-Scale Training Datasets is Practical
Poisoning Web-Scale Training Datasets is Practical
Nicholas Carlini
Matthew Jagielski
Christopher A. Choquette-Choo
Daniel Paleka
Will Pearce
Hyrum S. Anderson
Andreas Terzis
Kurt Thomas
Florian Tramèr
SILM
24
181
0
20 Feb 2023
Scaling Laws for Multilingual Neural Machine Translation
Scaling Laws for Multilingual Neural Machine Translation
Patrick Fernandes
Behrooz Ghorbani
Xavier Garcia
Markus Freitag
Orhan Firat
28
28
0
19 Feb 2023
Auditing large language models: a three-layered approach
Auditing large language models: a three-layered approach
Jakob Mokander
Jonas Schuett
Hannah Rose Kirk
Luciano Floridi
AILaw
MLAU
29
193
0
16 Feb 2023
Counting Carbon: A Survey of Factors Influencing the Emissions of
  Machine Learning
Counting Carbon: A Survey of Factors Influencing the Emissions of Machine Learning
A. Luccioni
Alex Hernandez-Garcia
24
45
0
16 Feb 2023
Over-parametrization via Lifting for Low-rank Matrix Sensing: Conversion
  of Spurious Solutions to Strict Saddle Points
Over-parametrization via Lifting for Low-rank Matrix Sensing: Conversion of Spurious Solutions to Strict Saddle Points
Ziye Ma
Igor Molybog
Javad Lavaei
Somayeh Sojoudi
13
3
0
15 Feb 2023
Binarized Neural Machine Translation
Binarized Neural Machine Translation
Yichi Zhang
Ankush Garg
Yuan Cao
Lukasz Lew
Behrooz Ghorbani
Zhiru Zhang
Orhan Firat
MQ
29
14
0
09 Feb 2023
The unreasonable effectiveness of few-shot learning for machine
  translation
The unreasonable effectiveness of few-shot learning for machine translation
Xavier Garcia
Yamini Bansal
Colin Cherry
George F. Foster
M. Krikun
Fan Feng
Melvin Johnson
Orhan Firat
27
102
0
02 Feb 2023
A Survey on Efficient Training of Transformers
A Survey on Efficient Training of Transformers
Bohan Zhuang
Jing Liu
Zizheng Pan
Haoyu He
Yuetian Weng
Chunhua Shen
18
47
0
02 Feb 2023
Scaling laws for single-agent reinforcement learning
Scaling laws for single-agent reinforcement learning
Jacob Hilton
Jie Tang
John Schulman
19
20
0
31 Jan 2023
Previous
1234567
Next