ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2201.11990
  4. Cited By
Using DeepSpeed and Megatron to Train Megatron-Turing NLG 530B, A
  Large-Scale Generative Language Model

Using DeepSpeed and Megatron to Train Megatron-Turing NLG 530B, A Large-Scale Generative Language Model

28 January 2022
Shaden Smith
M. Patwary
Brandon Norick
P. LeGresley
Samyam Rajbhandari
Jared Casper
Zhun Liu
Shrimai Prabhumoye
George Zerveas
V. Korthikanti
Elton Zhang
R. Child
Reza Yazdani Aminabadi
J. Bernauer
Xia Song
M. Shoeybi
Yuxiong He
Michael Houston
Saurabh Tiwary
Bryan Catanzaro
    MoE
ArXivPDFHTML

Papers citing "Using DeepSpeed and Megatron to Train Megatron-Turing NLG 530B, A Large-Scale Generative Language Model"

50 / 501 papers shown
Title
Reasoning in Large Language Models Through Symbolic Math Word Problems
Reasoning in Large Language Models Through Symbolic Math Word Problems
Vedant Gaur
Nikunj Saunshi
ReLM
LRM
10
25
0
03 Aug 2023
Skills-in-Context Prompting: Unlocking Compositionality in Large
  Language Models
Skills-in-Context Prompting: Unlocking Compositionality in Large Language Models
Jiaao Chen
Xiaoman Pan
Dian Yu
Kaiqiang Song
Xiaoyang Wang
Dong Yu
Jianshu Chen
ReLM
LRM
19
24
0
01 Aug 2023
A Zero-shot and Few-shot Study of Instruction-Finetuned Large Language
  Models Applied to Clinical and Biomedical Tasks
A Zero-shot and Few-shot Study of Instruction-Finetuned Large Language Models Applied to Clinical and Biomedical Tasks
Yanis Labrak
Mickael Rouvier
Richard Dufour
LM&MA
8
25
0
22 Jul 2023
Transferability of Convolutional Neural Networks in Stationary Learning
  Tasks
Transferability of Convolutional Neural Networks in Stationary Learning Tasks
Damian Owerko
Charilaos I. Kanatsoulis
Jennifer Bondarchuk
Donald J. Bucci
Alejandro Ribeiro
BDL
21
0
0
21 Jul 2023
ZeroQuant-FP: A Leap Forward in LLMs Post-Training W4A8 Quantization
  Using Floating-Point Formats
ZeroQuant-FP: A Leap Forward in LLMs Post-Training W4A8 Quantization Using Floating-Point Formats
Xiaoxia Wu
Z. Yao
Yuxiong He
MQ
27
43
0
19 Jul 2023
Mini-Giants: "Small" Language Models and Open Source Win-Win
Mini-Giants: "Small" Language Models and Open Source Win-Win
Zhengping Zhou
Lezhi Li
Xinxi Chen
Andy Li
SyDa
ALM
MoE
24
6
0
17 Jul 2023
A Survey of Techniques for Optimizing Transformer Inference
A Survey of Techniques for Optimizing Transformer Inference
Krishna Teja Chitty-Venkata
Sparsh Mittal
M. Emani
V. Vishwanath
Arun Somani
29
62
0
16 Jul 2023
A Comprehensive Overview of Large Language Models
A Comprehensive Overview of Large Language Models
Humza Naveed
Asad Ullah Khan
Shi Qiu
Muhammad Saqib
Saeed Anwar
Muhammad Usman
Naveed Akhtar
Nick Barnes
Ajmal Saeed Mian
OffRL
46
514
0
12 Jul 2023
PolyLM: An Open Source Polyglot Large Language Model
PolyLM: An Open Source Polyglot Large Language Model
Xiangpeng Wei
Hao-Ran Wei
Huan Lin
Tianhao Li
Pei Zhang
...
Yu Bowen
Dayiheng Liu
Baosong Yang
Fei Huang
Jun Xie
LRM
32
55
0
12 Jul 2023
Continual Learning as Computationally Constrained Reinforcement Learning
Continual Learning as Computationally Constrained Reinforcement Learning
Saurabh Kumar
Henrik Marklund
Anand Srinivasa Rao
Yifan Zhu
Hong Jun Jeon
Yueyang Liu
Benjamin Van Roy
CLL
27
22
0
10 Jul 2023
A Stitch in Time Saves Nine: Detecting and Mitigating Hallucinations of
  LLMs by Validating Low-Confidence Generation
A Stitch in Time Saves Nine: Detecting and Mitigating Hallucinations of LLMs by Validating Low-Confidence Generation
Neeraj Varshney
Wenlin Yao
Hongming Zhang
Jianshu Chen
Dong Yu
HILM
25
155
0
08 Jul 2023
Large Language Models for Supply Chain Optimization
Large Language Models for Supply Chain Optimization
Beibin Li
Konstantina Mellou
Bo-qing Zhang
Jeevan Pathuri
Ishai Menache
13
43
0
08 Jul 2023
A Survey on Evaluation of Large Language Models
A Survey on Evaluation of Large Language Models
Yu-Chu Chang
Xu Wang
Jindong Wang
Yuanyi Wu
Linyi Yang
...
Yue Zhang
Yi-Ju Chang
Philip S. Yu
Qian Yang
Xingxu Xie
ELM
LM&MA
ALM
58
1,496
0
06 Jul 2023
Several categories of Large Language Models (LLMs): A Short Survey
Several categories of Large Language Models (LLMs): A Short Survey
Saurabh Pahune
Manoj Chandrasekharan
AILaw
17
14
0
05 Jul 2023
Improving Automatic Parallel Training via Balanced Memory Workload
  Optimization
Improving Automatic Parallel Training via Balanced Memory Workload Optimization
Yujie Wang
Youhe Jiang
Xupeng Miao
Fangcheng Fu
Shenhan Zhu
Xiaonan Nie
Yaofeng Tu
Bin Cui
22
9
0
05 Jul 2023
CARE-MI: Chinese Benchmark for Misinformation Evaluation in Maternity
  and Infant Care
CARE-MI: Chinese Benchmark for Misinformation Evaluation in Maternity and Infant Care
Tong Xiang
Liangzhi Li
Wangyue Li
Min‐Jun Bai
Lu Wei
Bowen Wang
Noa Garcia
28
5
0
04 Jul 2023
An Efficient Sparse Inference Software Accelerator for Transformer-based
  Language Models on CPUs
An Efficient Sparse Inference Software Accelerator for Transformer-based Language Models on CPUs
Haihao Shen
Hengyu Meng
Bo Dong
Zhe Wang
Ofir Zafrir
...
Hanwen Chang
Qun Gao
Zi. Wang
Guy Boudoukh
Moshe Wasserblat
MoE
21
4
0
28 Jun 2023
Computron: Serving Distributed Deep Learning Models with Model Parallel
  Swapping
Computron: Serving Distributed Deep Learning Models with Model Parallel Swapping
Daniel Zou
X. Jin
Xueyang Yu
Haotian Zhang
J. Demmel
MoE
16
0
0
24 Jun 2023
AI could create a perfect storm of climate misinformation
AI could create a perfect storm of climate misinformation
V. Galaz
Hannah Metzler
Stefan Daume
A. Olsson
B. Lindström
A. Marklund
16
5
0
22 Jun 2023
GPT-Based Models Meet Simulation: How to Efficiently Use Large-Scale
  Pre-Trained Language Models Across Simulation Tasks
GPT-Based Models Meet Simulation: How to Efficiently Use Large-Scale Pre-Trained Language Models Across Simulation Tasks
Philippe J. Giabbanelli
LLMAG
ALM
AI4CE
15
13
0
21 Jun 2023
Mitigating Communication Costs in Neural Networks: The Role of Dendritic Nonlinearity
Mitigating Communication Costs in Neural Networks: The Role of Dendritic Nonlinearity
Xundong Wu
Pengfei Zhao
Zilin Yu
Lei Ma
K. Yip
Huajin Tang
Gang Pan
Poirazi Panayiota
Tiejun Huang
18
0
0
21 Jun 2023
DropCompute: simple and more robust distributed synchronous training via
  compute variance reduction
DropCompute: simple and more robust distributed synchronous training via compute variance reduction
Niv Giladi
Shahar Gottlieb
Moran Shkolnik
A. Karnieli
Ron Banner
Elad Hoffer
Kfir Y. Levy
Daniel Soudry
23
2
0
18 Jun 2023
ZeRO++: Extremely Efficient Collective Communication for Giant Model
  Training
ZeRO++: Extremely Efficient Collective Communication for Giant Model Training
Guanhua Wang
Heyang Qin
S. A. Jacobs
Connor Holmes
Samyam Rajbhandari
Olatunji Ruwase
Feng Yan
Lei Yang
Yuxiong He
VLM
53
56
0
16 Jun 2023
Parameter-efficient is not sufficient: Exploring Parameter, Memory, and
  Time Efficient Adapter Tuning for Dense Predictions
Parameter-efficient is not sufficient: Exploring Parameter, Memory, and Time Efficient Adapter Tuning for Dense Predictions
Dongshuo Yin
Xueting Han
Bin Li
Hao Feng
Jinghua Bai
VPVLM
26
16
0
16 Jun 2023
Understanding Optimization of Deep Learning via Jacobian Matrix and
  Lipschitz Constant
Understanding Optimization of Deep Learning via Jacobian Matrix and Lipschitz Constant
Xianbiao Qi
Jianan Wang
Lei Zhang
13
0
0
15 Jun 2023
Solving Large-scale Spatial Problems with Convolutional Neural Networks
Solving Large-scale Spatial Problems with Convolutional Neural Networks
Damian Owerko
Charilaos I. Kanatsoulis
Alejandro Ribeiro
14
2
0
14 Jun 2023
SqueezeLLM: Dense-and-Sparse Quantization
SqueezeLLM: Dense-and-Sparse Quantization
Sehoon Kim
Coleman Hooper
A. Gholami
Zhen Dong
Xiuyu Li
Sheng Shen
Michael W. Mahoney
Kurt Keutzer
MQ
24
165
0
13 Jun 2023
AraMUS: Pushing the Limits of Data and Model Scale for Arabic Natural
  Language Processing
AraMUS: Pushing the Limits of Data and Model Scale for Arabic Natural Language Processing
Asaad Alghamdi
Xinyu Duan
Wei Jiang
Zhenhai Wang
Yimeng Wu
...
Yifei Zheng
Mehdi Rezagholizadeh
Baoxing Huai
Peilun Cheng
Abbas Ghaddar
VLM
14
8
0
11 Jun 2023
The Age of Synthetic Realities: Challenges and Opportunities
The Age of Synthetic Realities: Challenges and Opportunities
J. P. Cardenuto
Jing Yang
Rafael Padilha
Renjie Wan
Daniel Moreira
Haoliang Li
Shiqi Wang
Fernanda A. Andaló
Sébastien Marcel
Anderson de Rezende Rocha
DeLMO
42
29
0
09 Jun 2023
STEPS: A Benchmark for Order Reasoning in Sequential Tasks
STEPS: A Benchmark for Order Reasoning in Sequential Tasks
Weizhi Wang
Hong Wang
Xi Yan
LRM
25
1
0
07 Jun 2023
An Empirical Analysis of Parameter-Efficient Methods for Debiasing
  Pre-Trained Language Models
An Empirical Analysis of Parameter-Efficient Methods for Debiasing Pre-Trained Language Models
Zhongbin Xie
Thomas Lukasiewicz
19
12
0
06 Jun 2023
On "Scientific Debt" in NLP: A Case for More Rigour in Language Model
  Pre-Training Research
On "Scientific Debt" in NLP: A Case for More Rigour in Language Model Pre-Training Research
Made Nindyatama Nityasya
Haryo Akbarianto Wibowo
Alham Fikri Aji
Genta Indra Winata
Radityo Eko Prasojo
Phil Blunsom
A. Kuncoro
13
8
0
05 Jun 2023
GAIA Search: Hugging Face and Pyserini Interoperability for NLP Training
  Data Exploration
GAIA Search: Hugging Face and Pyserini Interoperability for NLP Training Data Exploration
Aleksandra Piktus
Odunayo Ogundepo
Christopher Akiki
Akintunde Oladipo
Xinyu Crystina Zhang
Hailey Schoelkopf
Stella Biderman
Martin Potthast
Jimmy J. Lin
CVBM
33
10
0
02 Jun 2023
An Overview on Generative AI at Scale with Edge-Cloud Computing
An Overview on Generative AI at Scale with Edge-Cloud Computing
Yun Cheng Wang
Jintang Xue
Chengwei Wei
C.-C. Jay Kuo
24
30
0
02 Jun 2023
GPT4Image: Large Pre-trained Models Help Vision Models Learn Better on Perception Task
GPT4Image: Large Pre-trained Models Help Vision Models Learn Better on Perception Task
Ning Ding
Yehui Tang
Zhongqian Fu
Chaoting Xu
Kai Han
Yunhe Wang
MLLM
VLM
29
2
0
01 Jun 2023
Adam Accumulation to Reduce Memory Footprints of both Activations and
  Gradients for Large-scale DNN Training
Adam Accumulation to Reduce Memory Footprints of both Activations and Gradients for Large-scale DNN Training
Yijia Zhang
Yibo Han
Shijie Cao
Guohao Dai
Youshan Miao
Ting Cao
Fan Yang
Ningyi Xu
21
4
0
31 May 2023
Stochastic Bridges as Effective Regularizers for Parameter-Efficient
  Tuning
Stochastic Bridges as Effective Regularizers for Parameter-Efficient Tuning
Weize Chen
Xu Han
Yankai Lin
Zhiyuan Liu
Maosong Sun
Jie Zhou
11
1
0
28 May 2023
On Evaluating Adversarial Robustness of Large Vision-Language Models
On Evaluating Adversarial Robustness of Large Vision-Language Models
Yunqing Zhao
Tianyu Pang
Chao Du
Xiao Yang
Chongxuan Li
Ngai-man Cheung
Min-Bin Lin
VLM
AAML
MLLM
14
166
0
26 May 2023
Scaling Data-Constrained Language Models
Scaling Data-Constrained Language Models
Niklas Muennighoff
Alexander M. Rush
Boaz Barak
Teven Le Scao
Aleksandra Piktus
Nouamane Tazi
S. Pyysalo
Thomas Wolf
Colin Raffel
ALM
21
197
0
25 May 2023
Training Data Extraction From Pre-trained Language Models: A Survey
Training Data Extraction From Pre-trained Language Models: A Survey
Shotaro Ishihara
24
46
0
25 May 2023
Automated Tensor Model Parallelism with Overlapped Communication for
  Efficient Foundation Model Training
Automated Tensor Model Parallelism with Overlapped Communication for Efficient Foundation Model Training
Shengwei Li
Zhiquan Lai
Yanqi Hao
Weijie Liu
Ke-shi Ge
Xiaoge Deng
Dongsheng Li
KaiCheng Lu
11
10
0
25 May 2023
Skill-Based Few-Shot Selection for In-Context Learning
Skill-Based Few-Shot Selection for In-Context Learning
Shengnan An
Bo Zhou
Zeqi Lin
Qiang Fu
B. Chen
Nanning Zheng
Weizhu Chen
Jian-Guang Lou
29
31
0
23 May 2023
DetGPT: Detect What You Need via Reasoning
DetGPT: Detect What You Need via Reasoning
Renjie Pi
Jiahui Gao
Shizhe Diao
Rui Pan
Hanze Dong
...
Lewei Yao
Jianhua Han
Hang Xu
Lingpeng Kong Tong Zhang
Tong Zhang
LRM
LM&Ro
22
92
0
23 May 2023
Revisiting Acceptability Judgements
Revisiting Acceptability Judgements
Hai Hu
Ziyin Zhang
Wei Huang
J. Lai
Aini Li
Yi Ma
Jiahui Huang
Peng Zhang
Chien-Jer Charles Lin
Rui Wang
37
2
0
23 May 2023
Knowledge of Knowledge: Exploring Known-Unknowns Uncertainty with Large
  Language Models
Knowledge of Knowledge: Exploring Known-Unknowns Uncertainty with Large Language Models
Alfonso Amayuelas
Kyle Wong
Liangming Pan
Wenhu Chen
W. Wang
34
25
0
23 May 2023
InstructAlign: High-and-Low Resource Language Alignment via Continual
  Crosslingual Instruction Tuning
InstructAlign: High-and-Low Resource Language Alignment via Continual Crosslingual Instruction Tuning
Samuel Cahyawijaya
Holy Lovenia
Tiezheng Yu
Willy Chung
Pascale Fung
ALM
39
14
0
23 May 2023
A 4D Hybrid Algorithm to Scale Parallel Training to Thousands of GPUs
A 4D Hybrid Algorithm to Scale Parallel Training to Thousands of GPUs
Siddharth Singh
Prajwal Singhania
Aditya K. Ranjan
Zack Sating
A. Bhatele
17
3
0
22 May 2023
VideoLLM: Modeling Video Sequence with Large Language Models
VideoLLM: Modeling Video Sequence with Large Language Models
Guo Chen
Yin-Dong Zheng
Jiahao Wang
Jilan Xu
Yifei Huang
...
Yi Wang
Yali Wang
Yu Qiao
Tong Lu
Limin Wang
MLLM
92
76
0
22 May 2023
Multi-Task Instruction Tuning of LLaMa for Specific Scenarios: A
  Preliminary Study on Writing Assistance
Multi-Task Instruction Tuning of LLaMa for Specific Scenarios: A Preliminary Study on Writing Assistance
Yue Zhang
Leyang Cui
Deng Cai
Xinting Huang
Tao Fang
Wei Bi
ALM
21
34
0
22 May 2023
InheritSumm: A General, Versatile and Compact Summarizer by Distilling
  from GPT
InheritSumm: A General, Versatile and Compact Summarizer by Distilling from GPT
Yichong Xu
Ruochen Xu
Dan Iter
Yang Liu
Shuohang Wang
Chenguang Zhu
Michael Zeng
19
10
0
22 May 2023
Previous
123456...91011
Next