ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2201.11990
  4. Cited By
Using DeepSpeed and Megatron to Train Megatron-Turing NLG 530B, A
  Large-Scale Generative Language Model

Using DeepSpeed and Megatron to Train Megatron-Turing NLG 530B, A Large-Scale Generative Language Model

28 January 2022
Shaden Smith
M. Patwary
Brandon Norick
P. LeGresley
Samyam Rajbhandari
Jared Casper
Zhun Liu
Shrimai Prabhumoye
George Zerveas
V. Korthikanti
Elton Zhang
R. Child
Reza Yazdani Aminabadi
J. Bernauer
Xia Song
M. Shoeybi
Yuxiong He
Michael Houston
Saurabh Tiwary
Bryan Catanzaro
    MoE
ArXivPDFHTML

Papers citing "Using DeepSpeed and Megatron to Train Megatron-Turing NLG 530B, A Large-Scale Generative Language Model"

50 / 501 papers shown
Title
On the Compressibility of Quantized Large Language Models
On the Compressibility of Quantized Large Language Models
Yu Mao
Weilan Wang
Hongchao Du
Nan Guan
Chun Jason Xue
MQ
23
6
0
03 Mar 2024
Arithmetic Control of LLMs for Diverse User Preferences: Directional
  Preference Alignment with Multi-Objective Rewards
Arithmetic Control of LLMs for Diverse User Preferences: Directional Preference Alignment with Multi-Objective Rewards
Haoxiang Wang
Yong Lin
Wei Xiong
Rui Yang
Shizhe Diao
Shuang Qiu
Han Zhao
Tong Zhang
40
70
0
28 Feb 2024
LLM Task Interference: An Initial Study on the Impact of Task-Switch in
  Conversational History
LLM Task Interference: An Initial Study on the Impact of Task-Switch in Conversational History
Akash Gupta
Ivaxi Sheth
Vyas Raina
Mark J. F. Gales
Mario Fritz
30
4
0
28 Feb 2024
Compass: A Decentralized Scheduler for Latency-Sensitive ML Workflows
Compass: A Decentralized Scheduler for Latency-Sensitive ML Workflows
Yuting Yang
Andrea Merlina
Weijia Song
Tiancheng Yuan
Ken Birman
Roman Vitenberg
23
0
0
27 Feb 2024
DropBP: Accelerating Fine-Tuning of Large Language Models by Dropping Backward Propagation
DropBP: Accelerating Fine-Tuning of Large Language Models by Dropping Backward Propagation
Sunghyeon Woo
Baeseong Park
Byeongwook Kim
Minjung Jo
S. Kwon
Dongsuk Jeon
Dongsoo Lee
57
2
0
27 Feb 2024
Benchmarking LLMs on the Semantic Overlap Summarization Task
Benchmarking LLMs on the Semantic Overlap Summarization Task
John Salvador
Naman Bansal
Mousumi Akter
Souvik Sarkar
Anupam Das
S. Karmaker
31
2
0
26 Feb 2024
Nemotron-4 15B Technical Report
Nemotron-4 15B Technical Report
Jupinder Parmar
Shrimai Prabhumoye
Joseph Jennings
M. Patwary
Sandeep Subramanian
...
Ashwath Aithal
Oleksii Kuchaiev
M. Shoeybi
Jonathan Cohen
Bryan Catanzaro
31
22
0
26 Feb 2024
StochCA: A Novel Approach for Exploiting Pretrained Models with
  Cross-Attention
StochCA: A Novel Approach for Exploiting Pretrained Models with Cross-Attention
SeungWon Seo
Suho Lee
Sangheum Hwang
30
0
0
25 Feb 2024
MegaScale: Scaling Large Language Model Training to More Than 10,000
  GPUs
MegaScale: Scaling Large Language Model Training to More Than 10,000 GPUs
Ziheng Jiang
Haibin Lin
Yinmin Zhong
Qi Huang
Yangrui Chen
...
Zhe Li
X. Jia
Jia-jun Ye
Xin Jin
Xin Liu
LRM
38
100
0
23 Feb 2024
Copilot Evaluation Harness: Evaluating LLM-Guided Software Programming
Copilot Evaluation Harness: Evaluating LLM-Guided Software Programming
Anisha Agarwal
Aaron Chan
Shubham Chandel
Jinu Jang
Shaun Miller
Roshanak Zilouchian Moghaddam
Yevhen Mohylevskyy
Neel Sundaresan
Michele Tufano
ELM
16
13
0
22 Feb 2024
AdAdaGrad: Adaptive Batch Size Schemes for Adaptive Gradient Methods
AdAdaGrad: Adaptive Batch Size Schemes for Adaptive Gradient Methods
Tim Tsz-Kit Lau
Han Liu
Mladen Kolar
ODL
24
6
0
17 Feb 2024
Empowering Federated Learning for Massive Models with NVIDIA FLARE
Empowering Federated Learning for Massive Models with NVIDIA FLARE
Holger R. Roth
Ziyue Xu
Yuan-Ting Hsieh
Adithya Renduchintala
Isaac Yang
...
Camir Ricketts
Daguang Xu
Chester Chen
Yan Cheng
Andrew Feng
AI4CE
26
5
0
12 Feb 2024
Large Language Models: A Survey
Large Language Models: A Survey
Shervin Minaee
Tomáš Mikolov
Narjes Nikzad
M. Asgari-Chenaghlu
R. Socher
Xavier Amatriain
Jianfeng Gao
ALM
LM&MA
ELM
120
353
0
09 Feb 2024
ZeroPP: Unleashing Exceptional Parallelism Efficiency through
  Tensor-Parallelism-Free Methodology
ZeroPP: Unleashing Exceptional Parallelism Efficiency through Tensor-Parallelism-Free Methodology
Ding Tang
Lijuan Jiang
Jiecheng Zhou
Minxi Jin
Hengjie Li
Xingcheng Zhang
Zhiling Pei
Jidong Zhai
62
3
0
06 Feb 2024
The Landscape and Challenges of HPC Research and LLMs
The Landscape and Challenges of HPC Research and LLMs
Le Chen
Nesreen K. Ahmed
Akashnil Dutta
Arijit Bhattacharjee
Sixing Yu
...
Vy A. Vo
J. P. Muñoz
Ted Willke
Tim Mattson
Ali Jannesari
AI4CE
29
20
0
03 Feb 2024
EE-Tuning: An Economical yet Scalable Solution for Tuning Early-Exit
  Large Language Models
EE-Tuning: An Economical yet Scalable Solution for Tuning Early-Exit Large Language Models
Xuchen Pan
Yanxi Chen
Yaliang Li
Bolin Ding
Jingren Zhou
13
8
0
01 Feb 2024
When Large Language Models Meet Vector Databases: A Survey
When Large Language Models Meet Vector Databases: A Survey
Zhi Jing
Yongye Su
Yikun Han
Bo Yuan
Haiyun Xu
Chunjiang Liu
Kehai Chen
Min Zhang
53
35
0
30 Jan 2024
The Case for Co-Designing Model Architectures with Hardware
The Case for Co-Designing Model Architectures with Hardware
Quentin G. Anthony
Jacob Hatef
Deepak Narayanan
Stella Biderman
Stas Bekman
Junqi Yin
A. Shafi
Hari Subramoni
Dhabaleswar Panda
3DV
11
4
0
25 Jan 2024
An EcoSage Assistant: Towards Building A Multimodal Plant Care Dialogue
  Assistant
An EcoSage Assistant: Towards Building A Multimodal Plant Care Dialogue Assistant
Mohit Tomar
Abhisek Tiwari
Tulika Saha
Prince Jha
Sriparna Saha
14
1
0
10 Jan 2024
TeleChat Technical Report
TeleChat Technical Report
Zhongjiang He
Zihan Wang
Xinzhan Liu
Shixuan Liu
Yitong Yao
...
Zilu Huang
Sishi Xiong
Yuxiang Zhang
Chao Wang
Shuangyong Song
AI4MH
LRM
ALM
56
3
0
08 Jan 2024
Enhanced Automated Code Vulnerability Repair using Large Language Models
Enhanced Automated Code Vulnerability Repair using Large Language Models
David de-Fitero-Dominguez
Eva García-López
Antonio Garcia-Cabot
J. Martínez-Herráiz
14
11
0
08 Jan 2024
DeepSeek LLM: Scaling Open-Source Language Models with Longtermism
DeepSeek LLM: Scaling Open-Source Language Models with Longtermism
DeepSeek-AI Xiao Bi
:
Xiao Bi
Deli Chen
Guanting Chen
...
Yao Zhao
Shangyan Zhou
Shunfeng Zhou
Qihao Zhu
Yuheng Zou
LRM
ALM
139
304
0
05 Jan 2024
MLLM-Protector: Ensuring MLLM's Safety without Hurting Performance
MLLM-Protector: Ensuring MLLM's Safety without Hurting Performance
Renjie Pi
Tianyang Han
Jianshu Zhang
Yueqi Xie
Rui Pan
Qing Lian
Hanze Dong
Jipeng Zhang
Tong Zhang
AAML
23
59
0
05 Jan 2024
Training and Serving System of Foundation Models: A Comprehensive Survey
Training and Serving System of Foundation Models: A Comprehensive Survey
Jiahang Zhou
Yanyu Chen
Zicong Hong
Wuhui Chen
Yue Yu
Tao Zhang
Hui Wang
Chuan-fu Zhang
Zibin Zheng
ALM
22
5
0
05 Jan 2024
Understanding LLMs: A Comprehensive Overview from Training to Inference
Understanding LLMs: A Comprehensive Overview from Training to Inference
Yi-Hsueh Liu
Haoyang He
Tianle Han
Xu-Yao Zhang
Mengyuan Liu
...
Xintao Hu
Tuo Zhang
Ning Qiang
Tianming Liu
Bao Ge
SyDa
14
64
0
04 Jan 2024
The Art of Defending: A Systematic Evaluation and Analysis of LLM
  Defense Strategies on Safety and Over-Defensiveness
The Art of Defending: A Systematic Evaluation and Analysis of LLM Defense Strategies on Safety and Over-Defensiveness
Neeraj Varshney
Pavel Dolin
Agastya Seth
Chitta Baral
AAML
ELM
14
47
0
30 Dec 2023
Fairness-Aware Structured Pruning in Transformers
Fairness-Aware Structured Pruning in Transformers
A. Zayed
Gonçalo Mordido
Samira Shabanian
Ioana Baldini
Sarath Chandar
27
15
0
24 Dec 2023
Optimizing Distributed Training on Frontier for Large Language Models
Optimizing Distributed Training on Frontier for Large Language Models
Sajal Dash
Isaac Lyngaas
Junqi Yin
Xiao Wang
Romain Egele
Guojing Cong
Feiyi Wang
Prasanna Balaprakash
ALM
MoE
55
13
0
20 Dec 2023
An Adaptive Placement and Parallelism Framework for Accelerating RLHF
  Training
An Adaptive Placement and Parallelism Framework for Accelerating RLHF Training
Youshao Xiao
Weichang Wu
Zhenglei Zhou
Fagui Mao
Shangchun Zhao
Lin Ju
Lei Liang
Xiaolu Zhang
Jun Zhou
13
5
0
19 Dec 2023
G-LLaVA: Solving Geometric Problem with Multi-Modal Large Language Model
G-LLaVA: Solving Geometric Problem with Multi-Modal Large Language Model
Jiahui Gao
Renjie Pi
Jipeng Zhang
Jiacheng Ye
Wanjun Zhong
...
Lanqing Hong
Jianhua Han
Hang Xu
Zhenguo Li
Lingpeng Kong
SyDa
ReLM
LRM
44
95
0
18 Dec 2023
VILA: On Pre-training for Visual Language Models
VILA: On Pre-training for Visual Language Models
Ji Lin
Hongxu Yin
Wei Ping
Yao Lu
Pavlo Molchanov
Andrew Tao
Huizi Mao
Jan Kautz
M. Shoeybi
Song Han
MLLM
VLM
25
347
0
12 Dec 2023
Tenplex: Dynamic Parallelism for Deep Learning using Parallelizable
  Tensor Collections
Tenplex: Dynamic Parallelism for Deep Learning using Parallelizable Tensor Collections
Marcel Wagenlander
Guo Li
Bo Zhao
Luo Mai
Peter R. Pietzuch
22
6
0
08 Dec 2023
EE-LLM: Large-Scale Training and Inference of Early-Exit Large Language
  Models with 3D Parallelism
EE-LLM: Large-Scale Training and Inference of Early-Exit Large Language Models with 3D Parallelism
Yanxi Chen
Xuchen Pan
Yaliang Li
Bolin Ding
Jingren Zhou
LRM
21
31
0
08 Dec 2023
Moirai: Towards Optimal Placement for Distributed Inference on
  Heterogeneous Devices
Moirai: Towards Optimal Placement for Distributed Inference on Heterogeneous Devices
Beibei Zhang
Hongwei Zhu
Feng Gao
Zhihui Yang
Xiaoyang Sean Wang
14
1
0
07 Dec 2023
Holmes: Towards Distributed Training Across Clusters with Heterogeneous
  NIC Environment
Holmes: Towards Distributed Training Across Clusters with Heterogeneous NIC Environment
Fei Yang
Shuang Peng
Ning Sun
Fangyu Wang
Ke Tan
Fu Wu
Jiezhong Qiu
Aimin Pan
14
4
0
06 Dec 2023
Honesty Is the Best Policy: Defining and Mitigating AI Deception
Honesty Is the Best Policy: Defining and Mitigating AI Deception
Francis Rhys Ward
Francesco Belardinelli
Francesca Toni
Tom Everitt
110
27
0
03 Dec 2023
Temperature Balancing, Layer-wise Weight Analysis, and Neural Network
  Training
Temperature Balancing, Layer-wise Weight Analysis, and Neural Network Training
Yefan Zhou
Tianyu Pang
Keqin Liu
Charles H. Martin
Michael W. Mahoney
Yaoqing Yang
34
7
0
01 Dec 2023
ChatGPT's One-year Anniversary: Are Open-Source Large Language Models
  Catching up?
ChatGPT's One-year Anniversary: Are Open-Source Large Language Models Catching up?
Hailin Chen
Fangkai Jiao
Xingxuan Li
Chengwei Qin
Mathieu Ravaut
Ruochen Zhao
Caiming Xiong
Shafiq R. Joty
ELM
CLL
AI4MH
LRM
ALM
77
27
0
28 Nov 2023
vTrain: A Simulation Framework for Evaluating Cost-effective and
  Compute-optimal Large Language Model Training
vTrain: A Simulation Framework for Evaluating Cost-effective and Compute-optimal Large Language Model Training
Jehyeon Bang
Yujeong Choi
Myeongwoo Kim
Yongdeok Kim
Minsoo Rhu
14
14
0
27 Nov 2023
Justifiable Artificial Intelligence: Engineering Large Language Models
  for Legal Applications
Justifiable Artificial Intelligence: Engineering Large Language Models for Legal Applications
Sabine Wehnert
AILaw
29
4
0
27 Nov 2023
Large Language Models in Law: A Survey
Large Language Models in Law: A Survey
Jinqi Lai
Wensheng Gan
Jiayang Wu
Zhenlian Qi
Philip S. Yu
ELM
AILaw
21
70
0
26 Nov 2023
Who is leading in AI? An analysis of industry AI research
Who is leading in AI? An analysis of industry AI research
Ben Cottier
T. Besiroglu
David Owen
28
7
0
24 Nov 2023
Ethical Implications of ChatGPT in Higher Education: A Scoping Review
Ethical Implications of ChatGPT in Higher Education: A Scoping Review
Ming Li
Ariunaa Enkhtur
Fei Cheng
B. Yamamoto
17
7
0
24 Nov 2023
Deep Tensor Network
Deep Tensor Network
Yifan Zhang
16
0
0
18 Nov 2023
Using Cooperative Game Theory to Prune Neural Networks
Using Cooperative Game Theory to Prune Neural Networks
M. Diaz-Ortiz
Benjamin Kempinski
Daphne Cornelisse
Yoram Bachrach
Tal Kachman
33
2
0
17 Nov 2023
Empirical evaluation of Uncertainty Quantification in
  Retrieval-Augmented Language Models for Science
Empirical evaluation of Uncertainty Quantification in Retrieval-Augmented Language Models for Science
S. Wagle
Sai Munikoti
Anurag Acharya
Sara Smith
Sameera Horawalavithana
8
5
0
15 Nov 2023
On-the-Fly Fusion of Large Language Models and Machine Translation
On-the-Fly Fusion of Large Language Models and Machine Translation
Hieu T. Hoang
Huda Khayrallah
Marcin Junczys-Dowmunt
25
3
0
14 Nov 2023
InfMLLM: A Unified Framework for Visual-Language Tasks
InfMLLM: A Unified Framework for Visual-Language Tasks
Qiang-feng Zhou
Zhibin Wang
Wei Chu
Yinghui Xu
Hao Li
Yuan Qi
MLLM
16
11
0
12 Nov 2023
PerceptionGPT: Effectively Fusing Visual Perception into LLM
PerceptionGPT: Effectively Fusing Visual Perception into LLM
Renjie Pi
Lewei Yao
Jiahui Gao
Jipeng Zhang
Tong Zhang
MLLM
18
30
0
11 Nov 2023
Just-in-time Quantization with Processing-In-Memory for Efficient ML
  Training
Just-in-time Quantization with Processing-In-Memory for Efficient ML Training
M. Ibrahim
Shaizeen Aga
Ada Li
Suchita Pati
Mahzabeen Islam
21
3
0
08 Nov 2023
Previous
123456...91011
Next