Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2110.14883
Cited By
Colossal-AI: A Unified Deep Learning System For Large-Scale Parallel Training
28 October 2021
Yongbin Li
Hongxin Liu
Zhengda Bian
Boxiang Wang
Haichen Huang
Fan Cui
Chuan-Qing Wang
Yang You
GNN
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Colossal-AI: A Unified Deep Learning System For Large-Scale Parallel Training"
50 / 81 papers shown
Title
Plexus: Taming Billion-edge Graphs with 3D Parallel GNN Training
Aditya K. Ranjan
Siddharth Singh
Cunyang Wei
A. Bhatele
GNN
48
0
0
07 May 2025
Reinforced MLLM: A Survey on RL-Based Reasoning in Multimodal Large Language Models
Guanghao Zhou
Panjia Qiu
C. L. P. Chen
J. Wang
Zheming Yang
Jian Xu
Minghui Qiu
OffRL
LRM
55
0
0
30 Apr 2025
Stable-SPAM: How to Train in 4-Bit More Stably than 16-Bit Adam
Tianjin Huang
Haotian Hu
Zhenyu (Allen) Zhang
Gaojie Jin
X. Li
...
Tianlong Chen
Lu Liu
Qingsong Wen
Zhangyang Wang
Shiwei Liu
MQ
35
0
0
24 Feb 2025
Democratizing AI: Open-source Scalable LLM Training on GPU-based Supercomputers
Siddharth Singh
Prajwal Singhania
Aditya K. Ranjan
John Kirchenbauer
Jonas Geiping
...
Abhimanyu Hans
Manli Shu
Aditya Tomar
Tom Goldstein
A. Bhatele
94
2
0
12 Feb 2025
A Survey on Memory-Efficient Large-Scale Model Training in AI for Science
Kaiyuan Tian
Linbo Qiao
Baihui Liu
Gongqingjian Jiang
Dongsheng Li
31
0
0
21 Jan 2025
Generative AI Takes a Statistics Exam: A Comparison of Performance between ChatGPT3.5, ChatGPT4, and ChatGPT4o-mini
Monnie McGee
Bivin Sadler
ELM
53
1
0
17 Jan 2025
Deploying Foundation Model Powered Agent Services: A Survey
Wenchao Xu
Jinyu Chen
Peirong Zheng
Xiaoquan Yi
Tianyi Tian
...
Quan Wan
Haozhao Wang
Yunfeng Fan
Qinliang Su
Xuemin Shen
AI4CE
115
1
0
18 Dec 2024
Acceleration for Deep Reinforcement Learning using Parallel and Distributed Computing: A Survey
Zhihong Liu
Xin Xu
Peng Qiao
Dongsheng Li
OffRL
20
2
0
08 Nov 2024
BATON: Enhancing Batch-wise Inference Efficiency for Large Language Models via Dynamic Re-batching
Peizhuang Cong
Qizhi Chen
Haochen Zhao
Tong Yang
KELM
21
1
0
24 Oct 2024
Understanding and Alleviating Memory Consumption in RLHF for LLMs
Jin Zhou
Hanmei Yang
Steven
Tang
Mingcan Xiang
Hui Guan
Tongping Liu
31
0
0
21 Oct 2024
Pipeline Gradient-based Model Training on Analog In-memory Accelerators
Zhaoxian Wu
Quan-Wu Xiao
Tayfun Gokmen
H. Tsai
K. E. Maghraoui
Tianyi Chen
16
1
0
19 Oct 2024
TiMePReSt: Time and Memory Efficient Pipeline Parallel DNN Training with Removed Staleness
Ankita Dutta
Nabendu Chaki
Rajat K. De
22
0
0
18 Oct 2024
Comprehensive Performance Modeling and System Design Insights for Foundation Models
Shashank Subramanian
Ermal Rrapaj
Peter Harrington
Smeet Chheda
S. Farrell
Brian Austin
Samuel Williams
N. Wright
W. Bhimji
37
0
0
30 Sep 2024
EMMA-500: Enhancing Massively Multilingual Adaptation of Large Language Models
Shaoxiong Ji
Zihao Li
Indraneil Paul
Jaakko Paavola
Peiqin Lin
...
Dayyán O'Brien
Hengyu Luo
Hinrich Schütze
Jörg Tiedemann
Barry Haddow
CLL
35
3
0
26 Sep 2024
Achieving Peak Performance for Large Language Models: A Systematic Review
Z. R. K. Rostam
Sándor Szénási
Gábor Kertész
32
3
0
07 Sep 2024
LuWu: An End-to-End In-Network Out-of-Core Optimizer for 100B-Scale Model-in-Network Data-Parallel Training on Distributed GPUs
Mo Sun
Zihan Yang
Changyue Liao
Yingtao Li
Fei Wu
Zeke Wang
52
1
0
02 Sep 2024
Efficient Training of Large Language Models on Distributed Infrastructures: A Survey
Jiangfei Duan
Shuo Zhang
Zerui Wang
Lijuan Jiang
Wenwen Qu
...
Dahua Lin
Yonggang Wen
Xin Jin
Tianwei Zhang
Peng Sun
71
8
0
29 Jul 2024
WallFacer: Guiding Transformer Model Training Out of the Long-Context Dark Forest with N-body Problem
Ziming Liu
Shaoyu Wang
Shenggan Cheng
Zhongkai Zhao
Xuanlei Zhao
James Demmel
Yang You
32
0
0
30 Jun 2024
AI-coupled HPC Workflow Applications, Middleware and Performance
Wes Brewer
Ana Gainaru
Frédéric Suter
Feiyi Wang
M. Emani
S. Jha
30
10
0
20 Jun 2024
ProTrain: Efficient LLM Training via Memory-Aware Techniques
Hanmei Yang
Jin Zhou
Yao Fu
Xiaoqun Wang
Ramine Roane
Hui Guan
Tongping Liu
VLM
28
0
0
12 Jun 2024
Parrot: Efficient Serving of LLM-based Applications with Semantic Variable
Chaofan Lin
Zhenhua Han
Chengruidong Zhang
Yuqing Yang
Fan Yang
Chen Chen
Lili Qiu
71
38
0
30 May 2024
2BP: 2-Stage Backpropagation
Christopher Rae
Joseph K. L. Lee
James Richings
MoE
MQ
31
0
0
28 May 2024
OpenRLHF: An Easy-to-use, Scalable and High-performance RLHF Framework
Jian Hu
Xibin Wu
Weixun Wang
OpenLLMAI Team
Dehao Zhang
Yu Cao
AI4CE
VLM
19
90
0
20 May 2024
USP: A Unified Sequence Parallelism Approach for Long Context Generative AI
Jiarui Fang
Shangchun Zhao
32
15
0
13 May 2024
A Survey of Time Series Foundation Models: Generalizing Time Series Representation with Large Language Model
Jiexia Ye
Weiqi Zhang
Ke Yi
Yongzi Yu
Ziyue Li
Jia Li
Fugee Tsung
AI4TS
AI4CE
43
22
0
03 May 2024
DiffusionPipe: Training Large Diffusion Models with Efficient Pipelines
Ye Tian
Zhen Jia
Ziyue Luo
Yida Wang
Chuan Wu
AI4CE
16
2
0
02 May 2024
Human-Imperceptible Retrieval Poisoning Attacks in LLM-Powered Applications
Quan Zhang
Binqi Zeng
Chijin Zhou
Gwihwan Go
Heyuan Shi
Yu Jiang
SILM
AAML
32
19
0
26 Apr 2024
LLMem: Estimating GPU Memory Usage for Fine-Tuning Pre-Trained LLMs
Taeho Kim
Yanming Wang
Vatshank Chaturvedi
Lokesh Gupta
Seyeon Kim
Yongin Kwon
Sangtae Ha
36
4
0
16 Apr 2024
Aurora-M: The First Open Source Multilingual Language Model Red-teamed according to the U.S. Executive Order
Taishi Nakamura
Mayank Mishra
Simone Tedeschi
Yekun Chai
Jason T Stillerman
...
Virendra Mehta
Matthew Blumberg
Victor May
Huu Nguyen
S. Pyysalo
LRM
23
7
0
30 Mar 2024
LlamaFactory: Unified Efficient Fine-Tuning of 100+ Language Models
Yaowei Zheng
Richong Zhang
Junhao Zhang
Yanhan Ye
Zheyan Luo
Zhangchi Feng
Yongqiang Ma
30
364
0
20 Mar 2024
Online Training of Large Language Models: Learn while chatting
Juhao Liang
Ziwei Wang
Zhuoheng Ma
Jianquan Li
Zhiyi Zhang
Xiangbo Wu
Benyou Wang
KELM
37
3
0
04 Mar 2024
Peacock: A Family of Arabic Multimodal Large Language Models and Benchmarks
Fakhraddin Alwajih
El Moatez Billah Nagoudi
Gagan Bhatia
Abdelrahman Mohamed
Muhammad Abdul-Mageed
VLM
LRM
27
11
0
01 Mar 2024
Enhancing Role-playing Systems through Aggressive Queries: Evaluation and Improvement
Yihong Tang
Jiao Ou
Che Liu
Fuzheng Zhang
Di Zhang
Kun Gai
42
4
0
16 Feb 2024
Model Compression and Efficient Inference for Large Language Models: A Survey
Wenxiao Wang
Wei Chen
Yicong Luo
Yongliu Long
Zhengkai Lin
Liye Zhang
Binbin Lin
Deng Cai
Xiaofei He
MQ
36
46
0
15 Feb 2024
ZeroPP: Unleashing Exceptional Parallelism Efficiency through Tensor-Parallelism-Free Methodology
Ding Tang
Lijuan Jiang
Jiecheng Zhou
Minxi Jin
Hengjie Li
Xingcheng Zhang
Zhiling Pei
Jidong Zhai
62
3
0
06 Feb 2024
A Survey on Transformer Compression
Yehui Tang
Yunhe Wang
Jianyuan Guo
Zhijun Tu
Kai Han
Hailin Hu
Dacheng Tao
29
27
0
05 Feb 2024
Accelerating Heterogeneous Tensor Parallelism via Flexible Workload Control
Zhigang Wang
Xu Zhang
Ning Wang
Chuanfei Xu
Jie Nie
Zhiqiang Wei
Yu Gu
Ge Yu
11
0
0
21 Jan 2024
GMLake: Efficient and Transparent GPU Memory Defragmentation for Large-scale DNN Training with Virtual Memory Stitching
Cong Guo
Rui Zhang
Jiale Xu
Jingwen Leng
Zihan Liu
...
Minyi Guo
Hao Wu
Shouren Zhao
Junping Zhao
Ke Zhang
VLM
72
10
0
16 Jan 2024
INTERS: Unlocking the Power of Large Language Models in Search with Instruction Tuning
Yutao Zhu
Peitian Zhang
Chenghao Zhang
Yifei Chen
Binyu Xie
Zheng Liu
Ji-Rong Wen
Zhicheng Dou
21
15
0
12 Jan 2024
Understanding LLMs: A Comprehensive Overview from Training to Inference
Yi-Hsueh Liu
Haoyang He
Tianle Han
Xu-Yao Zhang
Mengyuan Liu
...
Xintao Hu
Tuo Zhang
Ning Qiang
Tianming Liu
Bao Ge
SyDa
19
65
0
04 Jan 2024
Towards Message Brokers for Generative AI: Survey, Challenges, and Opportunities
Alaa Saleh
Roberto Morabito
Sasu Tarkoma
Susanna Pirttikangas
Lauri Lovén
58
3
0
22 Dec 2023
An Adaptive Placement and Parallelism Framework for Accelerating RLHF Training
Youshao Xiao
Weichang Wu
Zhenglei Zhou
Fagui Mao
Shangchun Zhao
Lin Ju
Lei Liang
Xiaolu Zhang
Jun Zhou
21
5
0
19 Dec 2023
Moirai: Towards Optimal Placement for Distributed Inference on Heterogeneous Devices
Beibei Zhang
Hongwei Zhu
Feng Gao
Zhihui Yang
Xiaoyang Sean Wang
14
1
0
07 Dec 2023
CoLLiE: Collaborative Training of Large Language Models in an Efficient Way
Kai Lv
Shuo Zhang
Tianle Gu
Shuhao Xing
Jiawei Hong
...
Tengxiao Liu
Yu Sun
Penousal Machado
Hang Yan
Xipeng Qiu
35
7
0
01 Dec 2023
Robot Learning in the Era of Foundation Models: A Survey
Xuan Xiao
Jiahang Liu
Zhipeng Wang
Yanmin Zhou
Yong Qi
Qian Cheng
Bin He
Shuo Jiang
AI4CE
LM&Ro
21
26
0
24 Nov 2023
Advancing Transformer Architecture in Long-Context Large Language Models: A Comprehensive Survey
Yunpeng Huang
Jingwei Xu
Junyu Lai
Zixu Jiang
Taolue Chen
...
Xiaoxing Ma
Lijuan Yang
Zhou Xin
Shupeng Li
Penghao Zhao
LLMAG
KELM
28
54
0
21 Nov 2023
Applications of Large Scale Foundation Models for Autonomous Driving
Yu Huang
Yue Chen
Zhu Li
ELM
AI4CE
LRM
ALM
LM&Ro
48
15
0
20 Nov 2023
AMSP: Reducing Communication Overhead of ZeRO for Efficient LLM Training
Qiaoling Chen
Qi Hu
Guoteng Wang
Zhisheng Ye
Ting Huang
...
Yang Gao
Hang Yan
Yonggang Wen
Tianwei Zhang
Peng Sun
37
6
0
01 Nov 2023
FP8-LM: Training FP8 Large Language Models
Houwen Peng
Kan Wu
Yixuan Wei
Guoshuai Zhao
Yuxiang Yang
...
Zheng-Wei Zhang
Shuguang Liu
Joe Chau
Han Hu
Peng Cheng
MQ
59
38
0
27 Oct 2023
Analyzing Multilingual Competency of LLMs in Multi-Turn Instruction Following: A Case Study of Arabic
Sabri Boughorbel
Majd Hawasly
27
8
0
23 Oct 2023
1
2
Next