Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2201.11990
Cited By
Using DeepSpeed and Megatron to Train Megatron-Turing NLG 530B, A Large-Scale Generative Language Model
28 January 2022
Shaden Smith
M. Patwary
Brandon Norick
P. LeGresley
Samyam Rajbhandari
Jared Casper
Zhun Liu
Shrimai Prabhumoye
George Zerveas
V. Korthikanti
Elton Zhang
R. Child
Reza Yazdani Aminabadi
J. Bernauer
Xia Song
M. Shoeybi
Yuxiong He
Michael Houston
Saurabh Tiwary
Bryan Catanzaro
MoE
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Using DeepSpeed and Megatron to Train Megatron-Turing NLG 530B, A Large-Scale Generative Language Model"
50 / 501 papers shown
Title
Plexus: Taming Billion-edge Graphs with 3D Parallel GNN Training
Aditya K. Ranjan
Siddharth Singh
Cunyang Wei
A. Bhatele
GNN
48
0
0
07 May 2025
Accelerating Mixture-of-Experts Training with Adaptive Expert Replication
Athinagoras Skiadopoulos
Mark Zhao
Swapnil Gandhi
Thomas Norrie
Shrijeet Mukherjee
Christos Kozyrakis
MoE
91
0
0
28 Apr 2025
Aleph-Alpha-GermanWeb: Improving German-language LLM pre-training with model-based data curation and synthetic data generation
Thomas F Burns
Letitia Parcalabescu
Stephan Wäldchen
Michael Barlow
Gregor Ziegltrum
Volker Stampa
Bastian Harren
Björn Deiseroth
SyDa
36
0
0
24 Apr 2025
Analysing the Robustness of Vision-Language-Models to Common Corruptions
Muhammad Usama
Syeda Aishah Asim
Syed Bilal Ali
Syed Talal Wasim
Umair Bin Mansoor
VLM
36
0
0
18 Apr 2025
NNTile: a machine learning framework capable of training extremely large GPT language models on a single node
A. Mikhalev
Aleksandr Katrutsa
Konstantin Sozykin
Ivan V. Oseledets
25
0
0
17 Apr 2025
Lumos: Efficient Performance Modeling and Estimation for Large-scale LLM Training
Mingyu Liang
Hiwot Tadese Kassa
Wenyin Fu
Brian Coutinho
Louis Feng
Christina Delimitrou
21
0
0
12 Apr 2025
Orchestrate Multimodal Data with Batch Post-Balancing to Accelerate Multimodal Large Language Model Training
Yijie Zheng
Bangjun Xiao
Lei Shi
Xiaoyang Li
Faming Wu
Tianyu Li
Xuefeng Xiao
Y. Zhang
Y. Wang
Shouda Liu
MLLM
MoE
67
1
0
31 Mar 2025
WLB-LLM: Workload-Balanced 4D Parallelism for Large Language Model Training
Z. Wang
Anna Cai
Xinfeng Xie
Zaifeng Pan
Yue Guan
...
Shikai Li
Jianyu Huang
Chris Cai
Yuchen Hao
Yufei Ding
39
2
0
23 Mar 2025
ExpertRAG: Efficient RAG with Mixture of Experts -- Optimizing Context Retrieval for Adaptive LLM Responses
Esmail Gumaan
MoE
28
0
0
23 Mar 2025
An Expanded Massive Multilingual Dataset for High-Performance Language Technologies
Laurie Burchell
Ona de Gibert
Nikolay Arefyev
Mikko Aulamo
Marta Bañón
...
Pavel Stepachev
and Jörg Tiedemann
Dušan Variš
Tereza Vojtěchová
Jaume Zaragoza-Bernabeu
43
1
0
13 Mar 2025
PolyPythias: Stability and Outliers across Fifty Language Model Pre-Training Runs
Oskar van der Wal
Pietro Lesci
Max Muller-Eberstein
Naomi Saphra
Hailey Schoelkopf
Willem H. Zuidema
Stella Biderman
LRM
51
0
0
12 Mar 2025
Routing for Large ML Models
Ofir Cohen
Jose Yallouz Michael Schapira
Shahar Belkar
Tal Mizrahi
51
0
0
07 Mar 2025
Advancing MAPF towards the Real World: A Scalable Multi-Agent Realistic Testbed (SMART)
Jingtian Yan
Zhifei Li
William Kang
Yulun Zhang
Stephen Smith
Jiaoyang Li
43
0
0
03 Mar 2025
Linguistic Generalizability of Test-Time Scaling in Mathematical Reasoning
Guijin Son
Jiwoo Hong
Hyunwoo Ko
James Thorne
LRM
46
5
0
24 Feb 2025
Comprehensive Analysis of Transparency and Accessibility of ChatGPT, DeepSeek, And other SoTA Large Language Models
Ranjan Sapkota
Shaina Raza
Manoj Karkee
40
4
0
21 Feb 2025
Programming Every Example: Lifting Pre-training Data Quality Like Experts at Scale
Fan Zhou
Zengzhi Wang
Qian Liu
Junlong Li
Pengfei Liu
ALM
100
14
0
17 Feb 2025
Democratizing AI: Open-source Scalable LLM Training on GPU-based Supercomputers
Siddharth Singh
Prajwal Singhania
Aditya K. Ranjan
John Kirchenbauer
Jonas Geiping
...
Abhimanyu Hans
Manli Shu
Aditya Tomar
Tom Goldstein
A. Bhatele
94
2
0
12 Feb 2025
A Survey of Large Language Models for Healthcare: from Data, Technology, and Applications to Accountability and Ethics
Kai He
Rui Mao
Qika Lin
Yucheng Ruan
Xiang Lan
Mengling Feng
Erik Cambria
LM&MA
AILaw
93
151
0
28 Jan 2025
PalmBench: A Comprehensive Benchmark of Compressed Large Language Models on Mobile Platforms
Yilong Li
Jingyu Liu
Hao Zhang
M Badri Narayanan
Utkarsh Sharma
Shuai Zhang
Pan Hu
Yijing Zeng
Jayaram Raghuram
Suman Banerjee
MQ
39
2
0
10 Jan 2025
Scaling Efficient LLMs
B. N. Kausik
30
3
0
08 Jan 2025
FED: Fast and Efficient Dataset Deduplication Framework with GPU Acceleration
Youngjun Son
Chaewon Kim
Jaejin Lee
45
0
0
02 Jan 2025
Adaptive Batch Size Schedules for Distributed Training of Language Models with Data and Model Parallelism
Tim Tsz-Kit Lau
Weijian Li
Chenwei Xu
Han Liu
Mladen Kolar
97
0
0
30 Dec 2024
Deploying Foundation Model Powered Agent Services: A Survey
Wenchao Xu
Jinyu Chen
Peirong Zheng
Xiaoquan Yi
Tianyi Tian
...
Quan Wan
Haozhao Wang
Yunfeng Fan
Qinliang Su
Xuemin Shen
AI4CE
112
1
0
18 Dec 2024
Taming Sensitive Weights : Noise Perturbation Fine-tuning for Robust LLM Quantization
Dongwei Wang
Huanrui Yang
MQ
85
1
0
08 Dec 2024
Exploring AI Text Generation, Retrieval-Augmented Generation, and Detection Technologies: a Comprehensive Overview
Fnu Neha
Deepshikha Bhati
Deepak Kumar Shukla
Angela Guercio
Ben Ward
77
0
0
05 Dec 2024
No Culture Left Behind: ArtELingo-28, a Benchmark of WikiArt with Captions in 28 Languages
Youssef Mohamed
Runjia Li
Ibrahim Said Ahmad
Kilichbek Haydarov
Philip H. S. Torr
Kenneth Ward Church
Mohamed Elhoseiny
VLM
31
7
0
06 Nov 2024
Minder: Faulty Machine Detection for Large-scale Distributed Model Training
Yangtao Deng
Xiang Shi
Zhuo Jiang
X. Zhang
Lei Zhang
...
Fuliang Li
Shuguang Wang
H. Lin
Jianxi Ye
Minlan Yu
LRM
97
2
0
04 Nov 2024
Cephalo: Harnessing Heterogeneous GPU Clusters for Training Transformer Models
Runsheng Benson Guo
Utkarsh Anand
Arthur Chen
Khuzaima Daudjee
27
1
0
01 Nov 2024
How Does Critical Batch Size Scale in Pre-training?
Hanlin Zhang
Depen Morwani
Nikhil Vyas
Jingfeng Wu
Difan Zou
Udaya Ghai
Dean Phillips Foster
Sham Kakade
64
8
0
29 Oct 2024
COAT: Compressing Optimizer states and Activation for Memory-Efficient FP8 Training
Haocheng Xi
Han Cai
Ligeng Zhu
Y. Lu
Kurt Keutzer
Jianfei Chen
Song Han
MQ
60
9
0
25 Oct 2024
A Little Help Goes a Long Way: Efficient LLM Training by Leveraging Small LMs
A. S. Rawat
Veeranjaneyulu Sadhanala
Afshin Rostamizadeh
Ayan Chakrabarti
Wittawat Jitkrittum
...
Rakesh Shivanna
Sashank J. Reddi
A. Menon
Rohan Anil
Sanjiv Kumar
28
2
0
24 Oct 2024
Fluid: Scaling Autoregressive Text-to-image Generative Models with Continuous Tokens
Lijie Fan
Tianhong Li
Siyang Qin
Yuanzhen Li
Chen Sun
Michael Rubinstein
Deqing Sun
Kaiming He
Yonglong Tian
VLM
DiffM
35
41
0
17 Oct 2024
FALCON: Pinpointing and Mitigating Stragglers for Large-Scale Hybrid-Parallel Training
Tianyuan Wu
Wei Wang
Yinghao Yu
Siran Yang
Wenchao Wu
Qinkai Duan
Guodong Yang
Jiamang Wang
Lin Qu
Liping Zhang
22
6
0
16 Oct 2024
Scaling Laws for Multilingual Language Models
Yifei He
Alon Benhaim
Barun Patra
Praneetha Vaddamanu
Sanchit Ahuja
Parul Chopra
Vishrav Chaudhary
Han Zhao
Xia Song
28
3
0
15 Oct 2024
From promise to practice: realizing high-performance decentralized training
Zesen Wang
Jiaojiao Zhang
Xuyang Wu
M. Johansson
13
0
0
15 Oct 2024
DemoShapley: Valuation of Demonstrations for In-Context Learning
Shan Xie
Man Luo
Chadly Daniel Stern
Mengnan Du
Lu Cheng
25
1
0
10 Oct 2024
Personalized Visual Instruction Tuning
Renjie Pi
Jianshu Zhang
Tianyang Han
Jipeng Zhang
Rui Pan
Tong Zhang
MLLM
29
6
0
09 Oct 2024
ESPACE: Dimensionality Reduction of Activations for Model Compression
Charbel Sakr
Brucek Khailany
20
3
0
07 Oct 2024
Realizing Video Summarization from the Path of Language-based Semantic Understanding
Kuan-Chen Mu
Zhi-Yi Chin
Wei-Chen Chiu
13
0
0
06 Oct 2024
Mitigating Backdoor Threats to Large Language Models: Advancement and Challenges
Qin Liu
Wenjie Mo
Terry Tong
Jiashu Xu
Fei Wang
Chaowei Xiao
Muhao Chen
AAML
31
4
0
30 Sep 2024
NVLM: Open Frontier-Class Multimodal LLMs
Wenliang Dai
Nayeon Lee
Boxin Wang
Zhuoling Yang
Zihan Liu
Jon Barker
Tuomas Rintamaki
M. Shoeybi
Bryan Catanzaro
Wei Ping
MLLM
VLM
LRM
40
51
0
17 Sep 2024
CoCA: Regaining Safety-awareness of Multimodal Large Language Models with Constitutional Calibration
Jiahui Gao
Renjie Pi
Tianyang Han
Han Wu
Lanqing Hong
Lingpeng Kong
Xin Jiang
Zhenguo Li
39
5
0
17 Sep 2024
AlpaPICO: Extraction of PICO Frames from Clinical Trial Documents Using LLMs
Madhusudan Ghosh
Shrimon Mukherjee
Asmit Ganguly
Partha Basuchowdhuri
S. Naskar
Debasis Ganguly
29
7
0
15 Sep 2024
Retro-li: Small-Scale Retrieval Augmented Generation Supporting Noisy Similarity Searches and Domain Shift Generalization
Gentiana Rashiti
G. Karunaratne
Mrinmaya Sachan
Abu Sebastian
Abbas Rahimi
RALM
32
0
0
12 Sep 2024
FreeRide: Harvesting Bubbles in Pipeline Parallelism
Jiashu Zhang
Zihan Pan
Molly
Xu
Khuzaima S. Daudjee
88
0
0
11 Sep 2024
MathGLM-Vision: Solving Mathematical Problems with Multi-Modal Large Language Model
Zhen Yang
Jinhao Chen
Zhengxiao Du
Wenmeng Yu
Weihan Wang
Wenyi Hong
Zhihuan Jiang
Bin Xu
Yuxiao Dong
Jie Tang
VLM
LRM
32
8
0
10 Sep 2024
LuWu: An End-to-End In-Network Out-of-Core Optimizer for 100B-Scale Model-in-Network Data-Parallel Training on Distributed GPUs
Mo Sun
Zihan Yang
Changyue Liao
Yingtao Li
Fei Wu
Zeke Wang
37
1
0
02 Sep 2024
A Survey of Large Language Models for European Languages
Wazir Ali
S. Pyysalo
39
2
0
27 Aug 2024
ScalingFilter: Assessing Data Quality through Inverse Utilization of Scaling Laws
Ruihang Li
Yixuan Wei
Miaosen Zhang
Nenghai Yu
Han Hu
Houwen Peng
40
2
0
15 Aug 2024
GPT-3 Powered Information Extraction for Building Robust Knowledge Bases
Ritabrata Roy Choudhury
Soumik Dey
18
1
0
31 Jul 2024
1
2
3
4
...
9
10
11
Next