ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2105.13120
  4. Cited By
Sequence Parallelism: Long Sequence Training from System Perspective
v1v2v3 (latest)

Sequence Parallelism: Long Sequence Training from System Perspective

Annual Meeting of the Association for Computational Linguistics (ACL), 2021
26 May 2021
Shenggui Li
Fuzhao Xue
Chaitanya Baranwal
Yongbin Li
Yang You
ArXiv (abs)PDFHTMLHuggingFace (6 upvotes)

Papers citing "Sequence Parallelism: Long Sequence Training from System Perspective"

24 / 74 papers shown
Galaxy: A Resource-Efficient Collaborative Edge AI System for In-situ
  Transformer Inference
Galaxy: A Resource-Efficient Collaborative Edge AI System for In-situ Transformer Inference
Shengyuan Ye
Jiangsu Du
Liekang Zeng
Wenzhong Ou
Xiaowen Chu
Yutong Lu
Xu Chen
198
42
0
27 May 2024
360Zhinao Technical Report
360Zhinao Technical Report
360Zhinao Team
221
0
0
22 May 2024
The CAP Principle for LLM Serving: A Survey of Long-Context Large
  Language Model Serving
The CAP Principle for LLM Serving: A Survey of Long-Context Large Language Model Serving
Pai Zeng
Zhenyu Ning
Jieru Zhao
Weihao Cui
Mengwei Xu
Liwei Guo
Xusheng Chen
Yizhou Shan
LLMAG
292
5
0
18 May 2024
KV-Runahead: Scalable Causal LLM Inference by Parallel Key-Value Cache
  Generation
KV-Runahead: Scalable Causal LLM Inference by Parallel Key-Value Cache GenerationInternational Conference on Machine Learning (ICML), 2024
Minsik Cho
Mohammad Rastegari
Devang Naik
217
11
0
08 May 2024
Towards Green AI: Current status and future research
Towards Green AI: Current status and future research
Christian Clemm
Lutz Stobbe
Kishan Wimalawarne
Jan Druschke
258
6
0
01 May 2024
LoongServe: Efficiently Serving Long-context Large Language Models with
  Elastic Sequence Parallelism
LoongServe: Efficiently Serving Long-context Large Language Models with Elastic Sequence Parallelism
Bingya Wu
Shengyu Liu
Yinmin Zhong
Peng Sun
Xuanzhe Liu
Xin Jin
RALM
366
113
0
15 Apr 2024
BurstAttention: An Efficient Distributed Attention Framework for
  Extremely Long Sequences
BurstAttention: An Efficient Distributed Attention Framework for Extremely Long Sequences
Sun Ao
Weilin Zhao
Xu Han
Cheng Yang
Zhiyuan Liu
Chuan Shi
Maosong Sun
GNN
274
10
0
14 Mar 2024
Yi: Open Foundation Models by 01.AI
Yi: Open Foundation Models by 01.AI
01. AI
Alex Young
01.AI Alex Young
Bei Chen
Chao Li
...
Yue Wang
Yuxuan Cai
Zhenyu Gu
Zhiyuan Liu
Zonghong Dai
OSLMLRM
840
768
0
07 Mar 2024
Data Engineering for Scaling Language Models to 128K Context
Data Engineering for Scaling Language Models to 128K Context
Yao Fu
Yikang Shen
Xinyao Niu
Xiang Yue
Hanna Hajishirzi
Yoon Kim
Hao-Chun Peng
MoE
358
183
0
15 Feb 2024
InternEvo: Efficient Long-sequence Large Language Model Training via
  Hybrid Parallelism and Redundant Sharding
InternEvo: Efficient Long-sequence Large Language Model Training via Hybrid Parallelism and Redundant Sharding
Qiaoling Chen
Diandian Gu
Guoteng Wang
Xun Chen
Yingtong Xiong
...
Qi Hu
Xin Jin
Yonggang Wen
Tianwei Zhang
Yang Liu
306
10
0
17 Jan 2024
Infinite-LLM: Efficient LLM Service for Long Context with DistAttention
  and Distributed KVCache
Infinite-LLM: Efficient LLM Service for Long Context with DistAttention and Distributed KVCache
Bin Lin
Chen Zhang
Tao Peng
Hanyu Zhao
Wencong Xiao
...
Shen Li
Zhigang Ji
Tao Xie
Yong Li
Jialin Li
304
76
0
05 Jan 2024
Unicron: Economizing Self-Healing LLM Training at Scale
Unicron: Economizing Self-Healing LLM Training at Scale
Tao He
Xue Li
Zhibin Wang
Kun Qian
Jingbo Xu
Wenyuan Yu
Jingren Zhou
216
28
0
30 Dec 2023
Towards Message Brokers for Generative AI: Survey, Challenges, and
  Opportunities
Towards Message Brokers for Generative AI: Survey, Challenges, and Opportunities
Alaa Saleh
Roberto Morabito
Sasu Tarkoma
Susanna Pirttikangas
Lauri Lovén
340
10
0
22 Dec 2023
Gated Linear Attention Transformers with Hardware-Efficient Training
Gated Linear Attention Transformers with Hardware-Efficient Training
Aaron Courville
Bailin Wang
Songlin Yang
Yikang Shen
Yoon Kim
477
303
0
11 Dec 2023
Tenplex: Dynamic Parallelism for Deep Learning using Parallelizable
  Tensor Collections
Tenplex: Dynamic Parallelism for Deep Learning using Parallelizable Tensor CollectionsSymposium on Operating Systems Principles (SOSP), 2023
Marcel Wagenlander
Guo Li
Bo Zhao
Kai Zou
Peter R. Pietzuch
350
13
0
08 Dec 2023
The Efficiency Spectrum of Large Language Models: An Algorithmic Survey
The Efficiency Spectrum of Large Language Models: An Algorithmic Survey
Tianyu Ding
Tianyi Chen
Haidong Zhu
Jiachen Jiang
Yiqi Zhong
Jinxin Zhou
Guangzhi Wang
Zhihui Zhu
Ilya Zharkov
Luming Liang
397
33
0
01 Dec 2023
Advancing Transformer Architecture in Long-Context Large Language
  Models: A Comprehensive Survey
Advancing Transformer Architecture in Long-Context Large Language Models: A Comprehensive Survey
Yunpeng Huang
Jingwei Xu
Junyu Lai
Zixu Jiang
Taolue Chen
...
Xiaoxing Ma
Lijuan Yang
Zhou Xin
Shupeng Li
Penghao Zhao
LLMAGKELM
373
101
0
21 Nov 2023
Just-in-time Quantization with Processing-In-Memory for Efficient ML
  Training
Just-in-time Quantization with Processing-In-Memory for Efficient ML Training
M. Ibrahim
Shaizeen Aga
Ada Li
Suchita Pati
Mahzabeen Islam
275
8
0
08 Nov 2023
Ultra-Long Sequence Distributed Transformer
Ultra-Long Sequence Distributed Transformer
Xiao Wang
Isaac Lyngaas
A. Tsaris
Peng Chen
Sajal Dash
Mayanka Chandra Shekar
Tao Luo
Hong-Jun Yoon
Mohamed Wahib
Ravi Tandon
336
5
0
04 Nov 2023
Scaling Laws of RoPE-based Extrapolation
Scaling Laws of RoPE-based ExtrapolationInternational Conference on Learning Representations (ICLR), 2023
Xiaoran Liu
Hang Yan
Shuo Zhang
Chen An
Xipeng Qiu
Dahua Lin
265
117
0
08 Oct 2023
Ring Attention with Blockwise Transformers for Near-Infinite Context
Ring Attention with Blockwise Transformers for Near-Infinite ContextInternational Conference on Learning Representations (ICLR), 2023
Hao Liu
Matei A. Zaharia
Pieter Abbeel
657
393
0
03 Oct 2023
DeepSpeed Ulysses: System Optimizations for Enabling Training of Extreme
  Long Sequence Transformer Models
DeepSpeed Ulysses: System Optimizations for Enabling Training of Extreme Long Sequence Transformer Models
S. A. Jacobs
Masahiro Tanaka
Chengming Zhang
Minjia Zhang
L. Song
Samyam Rajbhandari
Yuxiong He
378
180
0
25 Sep 2023
Reducing Activation Recomputation in Large Transformer Models
Reducing Activation Recomputation in Large Transformer ModelsConference on Machine Learning and Systems (MLSys), 2022
V. Korthikanti
Jared Casper
Sangkug Lym
Lawrence C. McAfee
M. Andersch
Mohammad Shoeybi
Bryan Catanzaro
AI4CE
300
388
0
10 May 2022
Colossal-AI: A Unified Deep Learning System For Large-Scale Parallel
  Training
Colossal-AI: A Unified Deep Learning System For Large-Scale Parallel TrainingInternational Conference on Parallel Processing (ICPP), 2021
Yongbin Li
Hongxin Liu
Zhengda Bian
Boxiang Wang
Haichen Huang
Fan Cui
Chuan-Qing Wang
Yang You
GNN
282
190
0
28 Oct 2021
Previous
12
Page 2 of 2