Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2201.11990
Cited By
Using DeepSpeed and Megatron to Train Megatron-Turing NLG 530B, A Large-Scale Generative Language Model
28 January 2022
Shaden Smith
M. Patwary
Brandon Norick
P. LeGresley
Samyam Rajbhandari
Jared Casper
Zhun Liu
Shrimai Prabhumoye
George Zerveas
V. Korthikanti
Elton Zhang
R. Child
Reza Yazdani Aminabadi
J. Bernauer
Xia Song
M. Shoeybi
Yuxiong He
Michael Houston
Saurabh Tiwary
Bryan Catanzaro
MoE
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Using DeepSpeed and Megatron to Train Megatron-Turing NLG 530B, A Large-Scale Generative Language Model"
50 / 501 papers shown
Title
ExClaim: Explainable Neural Claim Verification Using Rationalization
Sai Gurrapu
Lifu Huang
Feras A. Batarseh
AAML
24
8
0
21 Jan 2023
ATP: Adaptive Tensor Parallelism for Foundation Models
Shenggan Cheng
Ziming Liu
Jiangsu Du
Yang You
16
6
0
20 Jan 2023
Learning Quantum Processes with Memory -- Quantum Recurrent Neural Networks
Dmytro Bondarenko
Robert Salzmann
Viktoria-S. Schmiesing
8
4
0
19 Jan 2023
Towards Sustainable Artificial Intelligence: An Overview of Environmental Protection Uses and Issues
Arnault Pachot
Céline Patissier
17
14
0
22 Dec 2022
SPT: Semi-Parametric Prompt Tuning for Multitask Prompted Learning
M Saiful Bari
Aston Zhang
Shuai Zheng
Xingjian Shi
Yi Zhu
Shafiq R. Joty
Mu Li
RALM
VLM
VPVLM
LRM
35
5
0
21 Dec 2022
JASMINE: Arabic GPT Models for Few-Shot Learning
El Moatez Billah Nagoudi
Muhammad Abdul-Mageed
AbdelRahim Elmadany
Alcides Alcoba Inciarte
Md. Tawkat Islam Khondaker
25
7
0
21 Dec 2022
Data Curation Alone Can Stabilize In-context Learning
Ting-Yun Chang
Robin Jia
19
51
0
20 Dec 2022
Identifying and Manipulating the Personality Traits of Language Models
Graham Caron
Shashank Srivastava
10
37
0
20 Dec 2022
Optimizing Prompts for Text-to-Image Generation
Y. Hao
Zewen Chi
Li Dong
Furu Wei
27
139
0
19 Dec 2022
Rethinking the Role of Scale for In-Context Learning: An Interpretability-based Case Study at 66 Billion Scale
Hritik Bansal
Karthik Gopalakrishnan
Saket Dingliwal
S. Bodapati
Katrin Kirchhoff
Dan Roth
LRM
22
48
0
18 Dec 2022
Quant 4.0: Engineering Quantitative Investment with Automated, Explainable and Knowledge-driven Artificial Intelligence
Jian Guo
Sai Wang
L. Ni
H. Shum
AIFin
19
7
0
13 Dec 2022
Structured information extraction from complex scientific text with fine-tuned large language models
Alex Dunn
John Dagdelen
Nicholas Walker
Sanghoon Lee
Andrew S. Rosen
Gerbrand Ceder
Kristin A. Persson
Anubhav Jain
16
89
0
10 Dec 2022
LEAD: Liberal Feature-based Distillation for Dense Retrieval
Hao-Lun Sun
Xiao Liu
Yeyun Gong
Anlei Dong
Jing Lu
Yan Zhang
Linjun Yang
Rangan Majumder
Nan Duan
44
2
0
10 Dec 2022
Audiovisual Masked Autoencoders
Mariana-Iuliana Georgescu
Eduardo Fonseca
Radu Tudor Ionescu
Mario Lucic
Cordelia Schmid
Anurag Arnab
SSL
32
43
0
09 Dec 2022
DeepSpeed Data Efficiency: Improving Deep Learning Model Quality and Training Efficiency via Efficient Data Sampling and Routing
Conglong Li
Z. Yao
Xiaoxia Wu
Minjia Zhang
Connor Holmes
Cheng Li
Yuxiong He
19
24
0
07 Dec 2022
Protein Language Models and Structure Prediction: Connection and Progression
Bozhen Hu
Jun-Xiong Xia
Jiangbin Zheng
Cheng Tan
Yufei Huang
Yongjie Xu
Stan Z. Li
19
40
0
30 Nov 2022
COMET: A Comprehensive Cluster Design Methodology for Distributed Deep Learning Training
D. Kadiyala
Saeed Rashidi
Taekyung Heo
A. Bambhaniya
T. Krishna
Alexandros Daglis
VLM
19
9
0
30 Nov 2022
Compressing Cross-Lingual Multi-Task Models at Qualtrics
Daniel Fernando Campos
Daniel J. Perry
S. Joshi
Yashmeet Gambhir
Wei Du
Zhengzheng Xing
Aaron Colak
8
1
0
29 Nov 2022
Understanding BLOOM: An empirical study on diverse NLP tasks
Parag Dakle
Sai Krishna Rallabandi
Preethi Raghavan
AI4CE
31
3
0
27 Nov 2022
A Survey of Text Representation Methods and Their Genealogy
Philipp Siebers
Christian Janiesch
Patrick Zschech
AI4TS
12
9
0
26 Nov 2022
Galvatron: Efficient Transformer Training over Multiple GPUs Using Automatic Parallelism
Xupeng Miao
Yujie Wang
Youhe Jiang
Chunan Shi
Xiaonan Nie
Hailin Zhang
Bin Cui
GNN
MoE
29
60
0
25 Nov 2022
SciAI4Industry -- Solving PDEs for industry-scale problems with deep learning
Philipp A. Witte
Russell J. Hewett
K. Saurabh
A. Sojoodi
Ranveer Chandra
AI4CE
13
2
0
23 Nov 2022
Program of Thoughts Prompting: Disentangling Computation from Reasoning for Numerical Reasoning Tasks
Wenhu Chen
Xueguang Ma
Xinyi Wang
William W. Cohen
ReLM
ReCod
LRM
56
731
0
22 Nov 2022
SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Models
Guangxuan Xiao
Ji Lin
Mickael Seznec
Hao Wu
Julien Demouth
Song Han
MQ
59
728
0
18 Nov 2022
Metadata Might Make Language Models Better
K. Beelen
Daniel Alexander van Strien
AI4CE
16
0
0
18 Nov 2022
Who Says Elephants Can't Run: Bringing Large Scale MoE Models into Cloud Scale Production
Young Jin Kim
Rawn Henry
Raffy Fahim
Hany Awadalla
MoE
21
23
0
18 Nov 2022
Follow the Wisdom of the Crowd: Effective Text Generation via Minimum Bayes Risk Decoding
Mirac Suzgun
Luke Melas-Kyriazi
Dan Jurafsky
22
43
0
14 Nov 2022
Aspects of scaling and scalability for flow-based sampling of lattice QCD
Ryan Abbott
M. S. Albergo
Aleksandar Botev
D. Boyda
Kyle Cranmer
...
Ali Razavi
Danilo Jimenez Rezende
F. Romero-López
P. Shanahan
Julian M. Urban
24
33
0
14 Nov 2022
Breadth-First Pipeline Parallelism
J. Lamy-Poirier
GNN
MoE
AI4CE
20
1
0
11 Nov 2022
Large Language Models with Controllable Working Memory
Daliang Li
A. S. Rawat
Manzil Zaheer
Xin Wang
Michal Lukasik
Andreas Veit
Felix X. Yu
Surinder Kumar
KELM
34
151
0
09 Nov 2022
Efficiently Scaling Transformer Inference
Reiner Pope
Sholto Douglas
Aakanksha Chowdhery
Jacob Devlin
James Bradbury
Anselm Levskaya
Jonathan Heek
Kefan Xiao
Shivani Agrawal
J. Dean
21
292
0
09 Nov 2022
BLOOM: A 176B-Parameter Open-Access Multilingual Language Model
BigScience Workshop
:
Teven Le Scao
Angela Fan
Christopher Akiki
...
Zhongli Xie
Zifan Ye
M. Bras
Younes Belkada
Thomas Wolf
VLM
89
2,301
0
09 Nov 2022
Astronomia ex machina: a history, primer, and outlook on neural networks in astronomy
Michael J. Smith
James E. Geach
21
32
0
07 Nov 2022
LMentry: A Language Model Benchmark of Elementary Language Tasks
Avia Efrat
Or Honovich
Omer Levy
27
19
0
03 Nov 2022
Two-stage LLM Fine-tuning with Less Specialization and More Generalization
Yihan Wang
Si Si
Daliang Li
Michal Lukasik
Felix X. Yu
Cho-Jui Hsieh
Inderjit S Dhillon
Sanjiv Kumar
34
29
0
01 Nov 2022
AdaMix: Mixture-of-Adaptations for Parameter-efficient Model Tuning
Yaqing Wang
Sahaj Agarwal
Subhabrata Mukherjee
Xiaodong Liu
Jing Gao
Ahmed Hassan Awadallah
Jianfeng Gao
MoE
11
117
0
31 Oct 2022
Changes from Classical Statistics to Modern Statistics and Data Science
Kai Zhang
Shan-Yu Liu
M. Xiong
26
0
0
30 Oct 2022
A Solvable Model of Neural Scaling Laws
A. Maloney
Daniel A. Roberts
J. Sully
29
51
0
30 Oct 2022
What Language Model to Train if You Have One Million GPU Hours?
Teven Le Scao
Thomas Wang
Daniel Hesslow
Lucile Saulnier
Stas Bekman
...
Lintang Sutawika
Jaesung Tae
Zheng-Xin Yong
Julien Launay
Iz Beltagy
MoE
AI4CE
225
103
0
27 Oct 2022
Fast DistilBERT on CPUs
Haihao Shen
Ofir Zafrir
Bo Dong
Hengyu Meng
Xinyu. Ye
Zhe Wang
Yi Ding
Hanwen Chang
Guy Boudoukh
Moshe Wasserblat
VLM
16
2
0
27 Oct 2022
TRScore: A Novel GPT-based Readability Scorer for ASR Segmentation and Punctuation model evaluation and selection
Piyush Behre
S.S. Tan
A. Shah
Harini Kesavamoorthy
Shuangyu Chang
Fei Zuo
C. Basoglu
Sayan D. Pathak
6
0
0
27 Oct 2022
Evaluating Parameter Efficient Learning for Generation
Peng-Tao Xu
M. Patwary
Shrimai Prabhumoye
Virginia Adams
R. Prenger
Wei Ping
Nayeon Lee
M. Shoeybi
Bryan Catanzaro
MoE
18
3
0
25 Oct 2022
Leveraging Large Language Models for Multiple Choice Question Answering
Joshua Robinson
Christopher Rytting
David Wingate
ELM
138
186
0
22 Oct 2022
Model Criticism for Long-Form Text Generation
Yuntian Deng
Volodymyr Kuleshov
Alexander M. Rush
31
19
0
16 Oct 2022
Enabling Classifiers to Make Judgements Explicitly Aligned with Human Values
Yejin Bang
Tiezheng Yu
Andrea Madotto
Zhaojiang Lin
Mona T. Diab
Pascale Fung
17
13
0
14 Oct 2022
PCFG-based Natural Language Interface Improves Generalization for Controlled Text Generation
Jingyu Zhang
James R. Glass
Tianxing He
16
2
0
14 Oct 2022
Bootstrapping Multilingual Semantic Parsers using Large Language Models
Abhijeet Awasthi
Nitish Gupta
Bidisha Samanta
Shachi Dave
Sunita Sarawagi
Partha P. Talukdar
24
7
0
13 Oct 2022
Language Model Decoding as Likelihood-Utility Alignment
Martin Josifoski
Maxime Peyrard
Frano Rajic
Jiheng Wei
Debjit Paul
...
Barun Patra
Vishrav Chaudhary
Emre Kıcıman
Boi Faltings
Robert West
35
4
0
13 Oct 2022
Spontaneous Emerging Preference in Two-tower Language Model
Zhengqi He
Taro Toyoizumi
LRM
13
1
0
13 Oct 2022
Large Language Models are few(1)-shot Table Reasoners
Wenhu Chen
LMTD
ReLM
LRM
9
138
0
13 Oct 2022
Previous
1
2
3
...
10
11
7
8
9
Next