Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2201.11990
Cited By
Using DeepSpeed and Megatron to Train Megatron-Turing NLG 530B, A Large-Scale Generative Language Model
28 January 2022
Shaden Smith
M. Patwary
Brandon Norick
P. LeGresley
Samyam Rajbhandari
Jared Casper
Zhun Liu
Shrimai Prabhumoye
George Zerveas
V. Korthikanti
Elton Zhang
R. Child
Reza Yazdani Aminabadi
J. Bernauer
Xia Song
M. Shoeybi
Yuxiong He
Michael Houston
Saurabh Tiwary
Bryan Catanzaro
MoE
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Using DeepSpeed and Megatron to Train Megatron-Turing NLG 530B, A Large-Scale Generative Language Model"
50 / 501 papers shown
Title
GPT-SW3: An Autoregressive Language Model for the Nordic Languages
Ariel Ekgren
Amaru Cuba Gyllensten
Felix Stollenwerk
Joey Öhman
T. Isbister
Evangelia Gogoulou
F. Carlsson
Alice Heiman
Judit Casademont
Magnus Sahlgren
21
13
0
22 May 2023
Farewell to Aimless Large-scale Pretraining: Influential Subset Selection for Language Model
Xiao Wang
Wei Zhou
Qi Zhang
Jie Zhou
Songyang Gao
Junzhe Wang
Menghan Zhang
Xiang Gao
Yunwen Chen
Tao Gui
34
7
0
22 May 2023
Evaluation of medium-large Language Models at zero-shot closed book generative question answering
René Peinl
Johannes Wirth
ELM
18
7
0
19 May 2023
Decouple knowledge from parameters for plug-and-play language modeling
Xin Cheng
Yankai Lin
Xiuying Chen
Dongyan Zhao
Rui Yan
KELM
22
2
0
19 May 2023
TELeR: A General Taxonomy of LLM Prompts for Benchmarking Complex Tasks
Shubhra (Santu) Karmaker
Dongji Feng
25
50
0
19 May 2023
A Survey of Safety and Trustworthiness of Large Language Models through the Lens of Verification and Validation
Xiaowei Huang
Wenjie Ruan
Wei Huang
Gao Jin
Yizhen Dong
...
Sihao Wu
Peipei Xu
Dengyu Wu
André Freitas
Mustafa A. Mustafa
ALM
27
81
0
19 May 2023
X-IQE: eXplainable Image Quality Evaluation for Text-to-Image Generation with Visual Large Language Models
Yixiong Chen
Li Liu
C. Ding
23
21
0
18 May 2023
SpecInfer: Accelerating Generative Large Language Model Serving with Tree-based Speculative Inference and Verification
Xupeng Miao
Gabriele Oliaro
Zhihao Zhang
Xinhao Cheng
Zeyu Wang
...
Chunan Shi
Zhuoming Chen
Daiyaan Arfeen
Reyna Abhyankar
Zhihao Jia
LRM
42
28
0
16 May 2023
When Giant Language Brains Just Aren't Enough! Domain Pizzazz with Knowledge Sparkle Dust
Minh Le Nguyen
Duy-Hung Nguyen
Shahab Sabahi
Hung Le
Jeffrey Yang
Hajime Hotta
11
1
0
12 May 2023
Bot or Human? Detecting ChatGPT Imposters with A Single Question
Hong Wang
Xuan Luo
Weizhi Wang
Xifeng Yan
DeLMO
14
26
0
10 May 2023
LACoS-BLOOM: Low-rank Adaptation with Contrastive objective on 8 bits Siamese-BLOOM
Wenhui Hua
Brian Williams
Davood Shamsi
26
3
0
10 May 2023
Evaluating Embedding APIs for Information Retrieval
Ehsan Kamalloo
Xinyu Crystina Zhang
Odunayo Ogundepo
Nandan Thakur
David Alfonso-Hermelo
Mehdi Rezagholizadeh
Jimmy J. Lin
RALM
27
19
0
10 May 2023
Exploring the Landscape of Machine Unlearning: A Comprehensive Survey and Taxonomy
T. Shaik
Xiaohui Tao
Haoran Xie
Lin Li
Xiaofeng Zhu
Qingyuan Li
MU
30
25
0
10 May 2023
Large Language Models Need Holistically Thought in Medical Conversational QA
Yixuan Weng
Bin Li
Fei Xia
Minjun Zhu
Bing Sun
Shizhu He
Kang Liu
Jun Zhao
LM&MA
AI4MH
LRM
DiffM
ELM
14
5
0
09 May 2023
SUR-adapter: Enhancing Text-to-Image Pre-trained Diffusion Models with Large Language Models
Shan Zhong
Zhongzhan Huang
Wushao Wen
Jinghui Qin
Liang Lin
19
40
0
09 May 2023
How Do In-Context Examples Affect Compositional Generalization?
Shengnan An
Zeqi Lin
Qiang Fu
B. Chen
Nanning Zheng
Jian-Guang Lou
Dongmei Zhang
30
49
0
08 May 2023
Residual Prompt Tuning: Improving Prompt Tuning with Residual Reparameterization
Anastasia Razdaibiedina
Yuning Mao
Rui Hou
Madian Khabsa
M. Lewis
Jimmy Ba
Amjad Almahairi
VLM
11
42
0
06 May 2023
BranchNorm: Robustly Scaling Extremely Deep Transformers
Yanjun Liu
Xianfeng Zeng
Fandong Meng
Jie Zhou
27
3
0
04 May 2023
Should ChatGPT and Bard Share Revenue with Their Data Providers? A New Business Model for the AI Era
Dong Zhang
13
3
0
04 May 2023
AutoML-GPT: Automatic Machine Learning with GPT
Shujian Zhang
Chengyue Gong
Lemeng Wu
Xingchao Liu
Mi Zhou
LLMAG
52
59
0
04 May 2023
Cheaply Evaluating Inference Efficiency Metrics for Autoregressive Transformer APIs
Deepak Narayanan
Keshav Santhanam
Peter Henderson
Rishi Bommasani
Tony Lee
Percy Liang
137
3
0
03 May 2023
Plan, Eliminate, and Track -- Language Models are Good Teachers for Embodied Agents
Yu-Chih Chen
So Yeon Min
Chase Davis
Ruslan Salakhutdinov
A. Azaria
Yuan-Fang Li
Tom Michael Mitchell
A. Bovik
LM&Ro
LLMAG
70
32
0
03 May 2023
Distilling Step-by-Step! Outperforming Larger Language Models with Less Training Data and Smaller Model Sizes
Lokesh Nagalapatti
Chun-Liang Li
Chih-Kuan Yeh
Hootan Nakhost
Yasuhisa Fujii
Alexander Ratner
Ranjay Krishna
Chen-Yu Lee
Tomas Pfister
ALM
206
499
0
03 May 2023
Harnessing the Power of LLMs in Practice: A Survey on ChatGPT and Beyond
Jingfeng Yang
Hongye Jin
Ruixiang Tang
Xiaotian Han
Qizhang Feng
Haoming Jiang
Bing Yin
Xia Hu
LM&MA
125
614
0
26 Apr 2023
On the Challenges of Using Black-Box APIs for Toxicity Evaluation in Research
Luiza Amador Pozzobon
B. Ermiş
Patrick Lewis
Sara Hooker
24
45
0
24 Apr 2023
Benchmarking ChatGPT-4 on ACR Radiation Oncology In-Training (TXIT) Exam and Red Journal Gray Zone Cases: Potentials and Challenges for AI-Assisted Medical Education and Decision Making in Radiation Oncology
Yixing Huang
A. Gomaa
S. Semrau
M. Haderlein
S. Lettmaier
...
L. Distel
Andreas K. Maier
R. Fietkau
Christoph Bert
F. Putz
ELM
LM&MA
AI4MH
19
9
0
24 Apr 2023
Enhancing Chain-of-Thoughts Prompting with Iterative Bootstrapping in Large Language Models
Jiashuo Sun
Yi Luo
Yeyun Gong
Chen Lin
Yelong Shen
Jian Guo
Nan Duan
LRM
30
19
0
23 Apr 2023
Pipeline MoE: A Flexible MoE Implementation with Pipeline Parallelism
Xin Chen
Hengheng Zhang
Xiaotao Gu
Kaifeng Bi
Lingxi Xie
Qi Tian
MoE
14
4
0
22 Apr 2023
MiniGPT-4: Enhancing Vision-Language Understanding with Advanced Large Language Models
Deyao Zhu
Jun Chen
Xiaoqian Shen
Xiang Li
Mohamed Elhoseiny
VLM
MLLM
41
1,896
0
20 Apr 2023
GPT-NER: Named Entity Recognition via Large Language Models
Shuhe Wang
Xiaofei Sun
Xiaoya Li
Rongbin Ouyang
Fei Wu
Tianwei Zhang
Jiwei Li
Guoyin Wang
18
176
0
20 Apr 2023
A Theory on Adam Instability in Large-Scale Machine Learning
Igor Molybog
Peter Albert
Moya Chen
Zach DeVito
David Esiobu
...
Puxin Xu
Yuchen Zhang
Melanie Kambadur
Stephen Roller
Susan Zhang
AI4CE
20
29
0
19 Apr 2023
UniMax: Fairer and more Effective Language Sampling for Large-Scale Multilingual Pretraining
Hyung Won Chung
Noah Constant
Xavier Garcia
Adam Roberts
Yi Tay
Sharan Narang
Orhan Firat
21
49
0
18 Apr 2023
Outlier Suppression+: Accurate quantization of large language models by equivalent and optimal shifting and scaling
Xiuying Wei
Yunchen Zhang
Yuhang Li
Xiangguo Zhang
Ruihao Gong
Jian Ren
Zhengang Li
MQ
13
30
0
18 Apr 2023
ArguGPT: evaluating, understanding and identifying argumentative essays generated by GPT models
Yikang Liu
Ziyin Zhang
Wanyang Zhang
Shisen Yue
Xiaojing Zhao
Xinyuan Cheng
Yiwen Zhang
Hai Hu
DeLMO
14
49
0
16 Apr 2023
nanoLM: an Affordable LLM Pre-training Benchmark via Accurate Loss Prediction across Scales
Yiqun Yao
Siqi Fan
Xiusheng Huang
Xuezhi Fang
Xiang Li
...
Peng Han
Shuo Shang
Kang Liu
Aixin Sun
Yequan Wang
17
6
0
14 Apr 2023
RAFT: Reward rAnked FineTuning for Generative Foundation Model Alignment
Hanze Dong
Wei Xiong
Deepanshu Goyal
Yihan Zhang
Winnie Chow
Rui Pan
Shizhe Diao
Jipeng Zhang
Kashun Shum
Tong Zhang
ALM
11
399
0
13 Apr 2023
Shall We Pretrain Autoregressive Language Models with Retrieval? A Comprehensive Study
Boxin Wang
Wei Ping
P. Xu
Lawrence C. McAfee
Zihan Liu
...
Oleksii Kuchaiev
Bo-wen Li
Chaowei Xiao
Anima Anandkumar
Bryan Catanzaro
RALM
32
55
0
13 Apr 2023
TinyReptile: TinyML with Federated Meta-Learning
Haoyu Ren
Darko Anicic
Thomas Runkler
25
16
0
11 Apr 2023
Financial Time Series Forecasting using CNN and Transformer
Zhen Zeng
Rachneet Kaur
S. Siddagangappa
Saba Rahimi
T. Balch
Manuela Veloso
AI4TS
AIFin
11
21
0
11 Apr 2023
Is ChatGPT a Good Sentiment Analyzer? A Preliminary Study
Zengzhi Wang
Qiming Xie
Yi Feng
Zixiang Ding
Zinong Yang
Rui Xia
AI4MH
LLMAG
19
146
0
10 Apr 2023
On Efficient Training of Large-Scale Deep Learning Models: A Literature Review
Li Shen
Yan Sun
Zhiyuan Yu
Liang Ding
Xinmei Tian
Dacheng Tao
VLM
24
39
0
07 Apr 2023
Cerebras-GPT: Open Compute-Optimal Language Models Trained on the Cerebras Wafer-Scale Cluster
Nolan Dey
Gurpreet Gosal
Zhiming Chen
Chen
Hemant Khachane
William Marshall
Ribhu Pathria
Marvin Tom
Joel Hestness
MoE
LRM
25
98
0
06 Apr 2023
Beyond Summarization: Designing AI Support for Real-World Expository Writing Tasks
Zejiang Shen
Tal August
Pao Siangliulue
Kyle Lo
Jonathan Bragg
Jeff Hammerbacher
Doug Downey
Joseph Chee Chang
David Sontag
ELM
14
18
0
05 Apr 2023
Adopting Two Supervisors for Efficient Use of Large-Scale Remote Deep Neural Networks
Michael Weiss
Paolo Tonella
AI4CE
10
0
0
05 Apr 2023
Scalable and Accurate Self-supervised Multimodal Representation Learning without Aligned Video and Text Data
Vladislav Lialin
Stephen Rawls
David M. Chan
Shalini Ghosh
Anna Rumshisky
Wael Hamza
VLM
AI4TS
28
6
0
04 Apr 2023
Effective Theory of Transformers at Initialization
Emily Dinan
Sho Yaida
Susan Zhang
20
14
0
04 Apr 2023
Pythia: A Suite for Analyzing Large Language Models Across Training and Scaling
Stella Biderman
Hailey Schoelkopf
Quentin G. Anthony
Herbie Bradley
Kyle O'Brien
...
USVSN Sai Prashanth
Edward Raff
Aviya Skowron
Lintang Sutawika
Oskar van der Wal
30
1,164
0
03 Apr 2023
BloombergGPT: A Large Language Model for Finance
Shijie Wu
Ozan Irsoy
Steven Lu
Vadim Dabravolski
Mark Dredze
Sebastian Gehrmann
P. Kambadur
David S. Rosenberg
Gideon Mann
AIFin
51
780
0
30 Mar 2023
The Online Pause and Resume Problem: Optimal Algorithms and An Application to Carbon-Aware Load Shifting
Adam Lechowicz
Nicolas H. Christianson
Jinhang Zuo
Noman Bashir
Mohammad Hajiesmaili
Adam Wierman
Prashant J. Shenoy
21
15
0
30 Mar 2023
Language Models can Solve Computer Tasks
Geunwoo Kim
Pierre Baldi
Stephen Marcus McAleer
LLMAG
LM&Ro
35
337
0
30 Mar 2023
Previous
1
2
3
...
5
6
7
...
9
10
11
Next