ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1909.08053
  4. Cited By
Megatron-LM: Training Multi-Billion Parameter Language Models Using
  Model Parallelism
v1v2v3v4 (latest)

Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism

17 September 2019
Mohammad Shoeybi
M. Patwary
Raul Puri
P. LeGresley
Jared Casper
Bryan Catanzaro
    MoE
ArXiv (abs)PDFHTML

Papers citing "Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism"

50 / 1,328 papers shown
Ming-Flash-Omni: A Sparse, Unified Architecture for Multimodal Perception and Generation
Ming-Flash-Omni: A Sparse, Unified Architecture for Multimodal Perception and Generation
Inclusion AI
Bowen Ma
Cheng Zou
C. Yan
Chunxiang Jin
...
Zhiqiang Fang
Zhihao Qiu
Ziyuan Huang
Zizheng Yang
Zhengyu He
MLLMMoEAuLLMVLMLRM
427
11
0
27 Mar 2026
PyGraph: Robust Compiler Support for CUDA Graphs in PyTorch
PyGraph: Robust Compiler Support for CUDA Graphs in PyTorch
Abhishek Ghosh
Ajay Nayak
Ashish Panwar
Arkaprava Basu
GNN
531
2
0
24 Dec 2025
RELIC: Interactive Video World Model with Long-Horizon Memory
RELIC: Interactive Video World Model with Long-Horizon Memory
Yicong Hong
Yiqun Mei
Chongjian Ge
Yiran Xu
Yang Zhou
...
Eli Shechtman
Kalyan Sunkavalli
Feng Liu
Z. Li
Hao Tan
VGenVLM
410
22
0
03 Dec 2025
MLPMoE: Zero-Shot Architectural Metamorphosis of Dense LLM MLPs into Static Mixture-of-Experts
MLPMoE: Zero-Shot Architectural Metamorphosis of Dense LLM MLPs into Static Mixture-of-Experts
Ivan Novikov
MoE
642
0
0
26 Nov 2025
Periodic Asynchrony: An On-Policy Approach for Accelerating LLM Reinforcement Learning
Periodic Asynchrony: An On-Policy Approach for Accelerating LLM Reinforcement Learning
Jian Lu
323
0
0
24 Nov 2025
From Code Foundation Models to Agents and Applications: A Comprehensive Survey and Practical Guide to Code Intelligence
From Code Foundation Models to Agents and Applications: A Comprehensive Survey and Practical Guide to Code Intelligence
J. Yang
Wei Emma Zhang
Shark Liu
J. Wu
Shawn Guo
...
Zizheng Zhan
Jiajun Zhang
Jie Zhang
Zhaoxiang Zhang
Bo Zheng
LLMAGALMELM
877
0
0
23 Nov 2025
Training Foundation Models on a Full-Stack AMD Platform: Compute, Networking, and System Design
Training Foundation Models on a Full-Stack AMD Platform: Compute, Networking, and System Design
Quentin G. Anthony
Yury Tokpanov
Skyler Szot
Srivatsan Rajagopal
Praneeth Medepalli
...
Emad Barsoum
Zhenyu Gu
Yao Fu
Beren Millidge
Beren Millidge
MoEVLMLRM
332
0
0
21 Nov 2025
A Scalable NorthPole System with End-to-End Vertical Integration for Low-Latency and Energy-Efficient LLM Inference
A Scalable NorthPole System with End-to-End Vertical Integration for Low-Latency and Energy-Efficient LLM Inference
M. DeBole
R. Appuswamy
Neil McGlohon
B. Taba
S. K. Esser
...
Ignacio Terrizzano
Takanori Ueda
Trent Gray-Donald
David Cox
D. Modha
131
0
0
20 Nov 2025
SALPA: Spaceborne LiDAR Point Adjustment for Enhanced GEDI Footprint Geolocation
SALPA: Spaceborne LiDAR Point Adjustment for Enhanced GEDI Footprint Geolocation
Narumasa Tsutsumida
Rei Mitsuhashi
Yoshito Sawada
Akira Kato
84
1
0
18 Nov 2025
Global Cross-Time Attention Fusion for Enhanced Solar Flare Prediction from Multivariate Time Series
Global Cross-Time Attention Fusion for Enhanced Solar Flare Prediction from Multivariate Time Series
Onur Vural
S. M. Hamdi
S. F. Boubrahimi
AI4TS
193
0
0
17 Nov 2025
P1: Mastering Physics Olympiads with Reinforcement Learning
P1: Mastering Physics Olympiads with Reinforcement Learning
Jiacheng Chen
Qianjia Cheng
F. Yu
Haiyuan Wan
Yuchen Zhang
...
Yu Cheng
Ning Ding
Bowen Zhou
Peng Ye
Ganqu Cui
ReLMLRMAI4CE
386
2
0
17 Nov 2025
BitSnap: Checkpoint Sparsification and Quantization in LLM Training
BitSnap: Checkpoint Sparsification and Quantization in LLM Training
Yanxin Peng
Qingping Li
Baodong Wu
Shigang Li
Guohao Dai
Shengen Yan
Yu Wang
MQ
360
0
0
15 Nov 2025
Speculative Decoding in Decentralized LLM Inference: Turning Communication Latency into Computation Throughput
Speculative Decoding in Decentralized LLM Inference: Turning Communication Latency into Computation Throughput
Jingwei Song
Wanyi Chen
Xinyuan Song
Chris Tong
Gufeng Chen
Tianyi Zhao
Eric Yang
Bill Shi
Lynn Ai
105
1
0
13 Nov 2025
STAGE: A Symbolic Tensor grAph GEnerator for distributed AI system co-design
STAGE: A Symbolic Tensor grAph GEnerator for distributed AI system co-design
Changhai Man
Joongun Park
Hanjiang Wu
Huan Xu
Srinivas Sridharan
Tushar Krishna
351
0
0
13 Nov 2025
MOSS: Efficient and Accurate FP8 LLM Training with Microscaling and Automatic Scaling
MOSS: Efficient and Accurate FP8 LLM Training with Microscaling and Automatic Scaling
Yu Zhang
Hui-Ling Zhen
Mingxuan Yuan
Bei Yu
MQ
387
1
0
08 Nov 2025
Can LLM Infer Risk Information From MCP Server System Logs?
Can LLM Infer Risk Information From MCP Server System Logs?
Jiayi Fu
Qiyao Sun
Yinggui Wang
207
0
0
08 Nov 2025
NVIDIA Nemotron Nano V2 VL
NVIDIA Nemotron Nano V2 VL
Nvidia
Amala Sanjay Deshmukh
Kateryna Chumachenko
Tuomas Rintamaki
Matthieu Le
...
Krzysztof Pawelec
Michael Evans
Katherine Luna
Jie Lou
Erick Galinkin
VLM
405
5
0
06 Nov 2025
TwIST: Rigging the Lottery in Transformers with Independent Subnetwork Training
TwIST: Rigging the Lottery in Transformers with Independent Subnetwork Training
Michael Menezes
Barbara Su
Xinze Feng
Yehya Farhat
Hamza Shili
Anastasios Kyrillidis
224
1
0
06 Nov 2025
PETRA: Pretrained Evolutionary Transformer for SARS-CoV-2 Mutation Prediction
PETRA: Pretrained Evolutionary Transformer for SARS-CoV-2 Mutation Prediction
Xu Zou
MedImAI4TS
177
1
0
06 Nov 2025
AReaL-Hex: Accommodating Asynchronous RL Training over Heterogeneous GPUs
AReaL-Hex: Accommodating Asynchronous RL Training over Heterogeneous GPUs
Ran Yan
Youhe Jiang
Tianyuan Wu
Jiaxuan Gao
Zhiyu Mei
Wei Fu
Haohui Mai
Wei Wang
Y. Wu
Binhang Yuan
OffRL
212
4
0
02 Nov 2025
HPLT 3.0: Very Large-Scale Multilingual Resources for LLM and MT. Mono- and Bi-lingual Data, Multilingual Evaluation, and Pre-Trained Models
HPLT 3.0: Very Large-Scale Multilingual Resources for LLM and MT. Mono- and Bi-lingual Data, Multilingual Evaluation, and Pre-Trained Models
Stephan Oepen
Nikolay Arefev
Mikko Aulamo
Marta Bañón
Maja Buljan
...
Teemu Vahtola
Dušan Variš
Fedor Vitiugin
Tea Vojtěchová
Jaume Zaragoza
254
3
0
02 Nov 2025
Tree Training: Accelerating Agentic LLMs Training via Shared Prefix Reuse
Tree Training: Accelerating Agentic LLMs Training via Shared Prefix Reuse
Shaojie Wang
Jinghui Wang
Yinghan Cui
Xuxing Chen
Chao Wang
...
Xiaojiang Zhang
J. Peng
Li Wan
Haotian Zhang
Bin Chen
199
2
0
01 Nov 2025
LongCat-Flash-Omni Technical Report
LongCat-Flash-Omni Technical Report
M-A-P Team
Bairui Wang
Bayan
Bin Xiao
Bo Zhang
...
Xin Pan
Xin Chen
Xiusong Sun
Xu Xiang
X. Xing
MLLMVLM
661
16
0
31 Oct 2025
Exploring Landscapes for Better Minima along Valleys
Exploring Landscapes for Better Minima along Valleys
Tong Zhao
Jiacheng Li
Yuanchang Zhou
Guangming Tan
Weile Jia
136
1
0
31 Oct 2025
Emu3.5: Native Multimodal Models are World Learners
Emu3.5: Native Multimodal Models are World Learners
Yufeng Cui
Honghao Chen
Haoge Deng
X. Y. Huang
Xinghang Li
...
Zhuo Chen
Yulong Ao
Tiejun Huang
Zhongyuan Wang
Xinlong Wang
MLLMVGen
564
57
0
30 Oct 2025
Defeating the Training-Inference Mismatch via FP16
Defeating the Training-Inference Mismatch via FP16
Penghui Qi
Zichen Liu
Xiangxin Zhou
Tianyu Pang
Chao Du
Wee Sun Lee
Min Lin
237
23
0
30 Oct 2025
Jasmine: A Simple, Performant and Scalable JAX-based World Modeling Codebase
Jasmine: A Simple, Performant and Scalable JAX-based World Modeling Codebase
Mihir Mahajan
Alfred Nguyen
Franz Srambical
Stefan Bauer
233
0
0
30 Oct 2025
AgentFrontier: Expanding the Capability Frontier of LLM Agents with ZPD-Guided Data Synthesis
AgentFrontier: Expanding the Capability Frontier of LLM Agents with ZPD-Guided Data Synthesis
Xuanzhong Chen
Zile Qiao
Guoxin Chen
L. Su
Zhen Zhang
Xinyu Wang
Pengjun Xie
Fei Huang
Jingren Zhou
Yong Jiang
LLMAGELM
206
6
0
28 Oct 2025
REVE: A Foundation Model for EEG -- Adapting to Any Setup with Large-Scale Pretraining on 25,000 Subjects
REVE: A Foundation Model for EEG -- Adapting to Any Setup with Large-Scale Pretraining on 25,000 Subjects
Yassine El Ouahidi
Jonathan Lys
Philipp Tholke
Nicolas Farrugia
Bastien Pasdeloup
Vincent Gripon
Karim Jerbi
G. Lioi
AI4TSVLM
187
14
0
24 Oct 2025
AsyncHZP: Hierarchical ZeRO Parallelism with Asynchronous Scheduling for Scalable LLM Training
AsyncHZP: Hierarchical ZeRO Parallelism with Asynchronous Scheduling for Scalable LLM Training
Huawei Bai
Yifan Huang
Wenqi Shi
Ansheng You
Feifan Shao
Tengfei Han
Minghui Yu
132
0
0
23 Oct 2025
Collective Communication for 100k+ GPUs
Collective Communication for 100k+ GPUs
Min Si
Pavan Balaji
Yongzhou Chen
Ching-Hsiang Chu
Adi Gangidi
...
Yimeng Zhao
Shengbao Zheng
Art Zhu
Hongyi Zeng
Hongyi Zeng
GNNAI4CE
468
9
0
23 Oct 2025
RLBoost: Harvesting Preemptible Resources for Cost-Efficient Reinforcement Learning on LLMs
RLBoost: Harvesting Preemptible Resources for Cost-Efficient Reinforcement Learning on LLMs
Yongji Wu
Xueshen Liu
Haizhong Zheng
Juncheng Gu
Beidi Chen
Z. Morley Mao
Arvind Krishnamurthy
Eric Liang
OffRLSILMOnRL
388
1
0
22 Oct 2025
MoE-Prism: Disentangling Monolithic Experts for Elastic MoE Services via Model-System Co-Designs
MoE-Prism: Disentangling Monolithic Experts for Elastic MoE Services via Model-System Co-Designs
Xinfeng Xia
Jiacheng Liu
Xiaofeng Hou
Peng Tang
Mingxuan Zhang
Wenfeng Wang
Chao Li
MoE
218
0
0
22 Oct 2025
HybridEP: Scaling Expert Parallelism to Cross-Datacenter Scenario via Hybrid Expert/Data Transmission
HybridEP: Scaling Expert Parallelism to Cross-Datacenter Scenario via Hybrid Expert/Data Transmission
Weihao Yang
Hao Huang
Donglei Wu
Ningke Li
Yanqi Pan
Qiyang Zheng
Wen Xia
Shiyi Li
Qiang Wang
MoE
199
1
0
22 Oct 2025
Reasoning Language Model Inference Serving Unveiled: An Empirical Study
Reasoning Language Model Inference Serving Unveiled: An Empirical Study
Qi Li
Junpan Wu
Xiang Liu
Yuxin Wang
Z. Li
Zhenheng Tang
Yuhan Chen
Shaohuai Shi
Xiaowen Chu
ReLMLRM
325
1
0
21 Oct 2025
Every Step Evolves: Scaling Reinforcement Learning for Trillion-Scale Thinking Model
Every Step Evolves: Scaling Reinforcement Learning for Trillion-Scale Thinking Model
Ling Team
Anqi Shen
B. Li
Bin Hu
Bin Jing
...
Z. Pan
Longxiang Zhang
Zhenzhong Lan
Zhiqiang Ding
Zhiqiang Zhang
ALMReLMLRM
367
15
0
21 Oct 2025
Scaling Laws Meet Model Architecture: Toward Inference-Efficient LLMs
Scaling Laws Meet Model Architecture: Toward Inference-Efficient LLMs
S. Bian
Tao Yu
Shivaram Venkataraman
Youngsuk Park
171
1
0
21 Oct 2025
Efficient Long-context Language Model Training by Core Attention Disaggregation
Efficient Long-context Language Model Training by Core Attention Disaggregation
Yonghao Zhuang
Junda Chen
Bo Pang
Yi Gu
Yibo Zhu
Yimin Jiang
Eric Liang
Eric Xing
Hao Zhang
172
0
0
20 Oct 2025
ReXMoE: Reusing Experts with Minimal Overhead in Mixture-of-Experts
ReXMoE: Reusing Experts with Minimal Overhead in Mixture-of-Experts
Zheyue Tan
Ruoyao Xiao
Tao Yuan
Dong Zhou
Weilin Liu
...
Haiyang Xu
Boxun Li
Guohao Dai
Bo Zhao
Yu Wang
MoE
242
1
0
20 Oct 2025
MUG-V 10B: High-efficiency Training Pipeline for Large Video Generation Models
MUG-V 10B: High-efficiency Training Pipeline for Large Video Generation Models
Yongshun Zhang
Zhongyi Fan
Yonghang Zhang
Zhangzikang Li
Weifeng Chen
Zhongwei Feng
Chaoyue Wang
Peng Hou
Anxiang Zeng
VGen
361
0
0
20 Oct 2025
MuonBP: Faster Muon via Block-Periodic Orthogonalization
MuonBP: Faster Muon via Block-Periodic Orthogonalization
Ahmed Khaled
Kaan Ozkara
Tao Yu
Mingyi Hong
Youngsuk Park
149
12
0
19 Oct 2025
MARSHAL: Incentivizing Multi-Agent Reasoning via Self-Play with Strategic LLMs
MARSHAL: Incentivizing Multi-Agent Reasoning via Self-Play with Strategic LLMs
Huining Yuan
Zelai Xu
Zheyue Tan
Xiangmin Yi
Mo Guang
...
Xinlei Chen
Bo Zhao
Xiao-Ping Zhang
Chao Yu
Yu Wang
LLMAGLRM
217
0
0
17 Oct 2025
First Attentions Last: Better Exploiting First Attentions for Efficient Transformer Training
First Attentions Last: Better Exploiting First Attentions for Efficient Transformer Training
Gyudong Kim
Hyukju Na
Jin Hyeon Kim
Hyunsung Jang
Jaemin Park
J. Hwang
Namkoo Ha
Seungryong Kim
Young Geun Kim
159
0
0
16 Oct 2025
Mirror Speculative Decoding: Breaking the Serial Barrier in LLM Inference
Mirror Speculative Decoding: Breaking the Serial Barrier in LLM Inference
Nikhil Bhendawade
K. Nishu
Arnav Kundu
Chris Bartels
Minsik Cho
Irina Belousova
LRM
423
1
0
15 Oct 2025
NOSA: Native and Offloadable Sparse Attention
NOSA: Native and Offloadable Sparse Attention
Yuxiang Huang
Chaojun Xiao
Xu Han
Zhiyuan Liu
Zhou Su
...
Hengyu Zhao
Yudong Wang
Chaojun Xiao
Xu Han
Zhiyuan Liu
MQ
223
0
0
15 Oct 2025
A Survey on Agentic Multimodal Large Language Models
A Survey on Agentic Multimodal Large Language Models
Huanjin Yao
Ruifei Zhang
Jiaxing Huang
Jingyi Zhang
Yibo Wang
...
Ruolin Zhu
Yongcheng Jing
Shunyu Liu
Guanbin Li
Dacheng Tao
LM&RoAIFinAI4TSLRMAI4CE
303
12
0
13 Oct 2025
Stabilizing MoE Reinforcement Learning by Aligning Training and Inference Routers
Stabilizing MoE Reinforcement Learning by Aligning Training and Inference Routers
Wenhan Ma
Hailin Zhang
Liang Zhao
Yifan Song
Yudong Wang
Zhifang Sui
Fuli Luo
MoE
359
19
0
13 Oct 2025
Part II: ROLL Flash -- Accelerating RLVR and Agentic Training with Asynchrony
Part II: ROLL Flash -- Accelerating RLVR and Agentic Training with Asynchrony
H. Lu
Zichen Liu
Shaopan Xiong
Yancheng He
W. Gao
...
Wei Wang
Wenbo Su
Jiamang Wang
Lin Qu
Bo Zheng
OffRL
123
2
0
13 Oct 2025
DCP: Addressing Input Dynamism In Long-Context Training via Dynamic Context Parallelism
DCP: Addressing Input Dynamism In Long-Context Training via Dynamic Context ParallelismSymposium on Operating Systems Principles (SOSP), 2025
Chenyu Jiang
Zhenkun Cai
Ye Tian
Zhen Jia
Yida Wang
Chuan Wu
148
0
0
12 Oct 2025
A Unified Framework for Lifted Training and Inversion Approaches
A Unified Framework for Lifted Training and Inversion Approaches
Xiaoyu Wang
Alexandra Valavanis
Azhir Mahmood
Andreas Mang
Martin Benning
Audrey Repetti
190
0
0
10 Oct 2025
1234...252627
Next
Page 1 of 27
Pageof 27