v1v2v3 (latest)

Sequence Parallelism: Long Sequence Training from System Perspective

Annual Meeting of the Association for Computational Linguistics (ACL), 2021

26 May 2021

Yang You

ArXiv (abs)PDF HTML HuggingFace (6 upvotes)

Papers citing "Sequence Parallelism: Long Sequence Training from System Perspective"

50 / 74 papers shown

RELIC: Interactive Video World Model with Long-Horizon Memory

...

306

03 Dec 2025

PipeDiT: Accelerating Diffusion Transformers in Video Generation with Task Pipelining and Model Decoupling

134

15 Nov 2025

In-Context Learning with Unpaired Clips for Instruction-based Video Editing

131

16 Oct 2025

TASP: Topology-aware Sequence Parallelism

165

30 Sep 2025

A Scalable Distributed Framework for Multimodal GigaVoxel Image Registration

131

29 Sep 2025

RollPacker: Mitigating Long-Tail Rollouts for Fast, Synchronous RL Post-Training

...

208

25 Sep 2025

Learning to Shard: RL for Co-optimizing the Parallelism Degrees and Per-operator Sharding Dimensions in Distributed LLM Inference

29 Aug 2025

TPLA: Tensor Parallel Latent Attention for Efficient Disaggregated Prefill and Decode Inference

197

21 Aug 2025

Modality Agnostic Efficient Long Range Encoder

T. Parag

Ahmed Elgammal

158

25 Jul 2025

Accelerating Parallel Diffusion Model Serving with Residual Compression

245

23 Jul 2025

ContentV: Efficient Training of Video Generation Models with Limited Compute

...

442

05 Jun 2025

Many-for-Many: Unify the Training of Multiple Video and Image Generation and Manipulation Tasks

264

02 Jun 2025

100-LongBench: Are de facto Long-Context Benchmarks Literally Evaluating Long-Context Ability?Annual Meeting of the Association for Computational Linguistics (ACL), 2025

210

25 May 2025

FlashForge: Ultra-Efficient Prefix-Aware Attention for LLM Decoding

...

224

23 May 2025

Longer Context, Deeper Thinking: Uncovering the Role of Long-Context Ability in Reasoning

265

22 May 2025

PaTH Attention: Position Encoding via Accumulating Householder Transformations

878

22 May 2025

MegaScale-MoE: Large-Scale Communication-Efficient Training of Mixture-of-Experts Models in Production

...

411

16 May 2025

Small Clips, Big Gains: Learning Long-Range Refocused Temporal Information for Video Super-Resolution

269

04 May 2025

SlimPipe: Memory-Thrifty and Efficient Pipeline Parallelism for Long-Context LLM Training

212

20 Apr 2025

Lumos: Efficient Performance Modeling and Estimation for Large-scale LLM Training

252

12 Apr 2025

Jupiter: Fast and Resource-Efficient Collaborative Inference of Generative LLMs on Edge DevicesIEEE Conference on Computer Communications (IEEE INFOCOM), 2025

370

11 Apr 2025

Orchestrate Multimodal Data with Batch Post-Balancing to Accelerate Multimodal Large Language Model Training

408

31 Mar 2025

Generating, Fast and Slow: Scalable Parallel Video Generation with Video Interface Networks

386

21 Mar 2025

ATTENTION2D: Communication Efficient Distributed Self-Attention Mechanism

Venmugil Elango

426

20 Mar 2025

AdaReTaKe: Adaptive Redundancy Reduction to Perceive Longer for Video-language UnderstandingAnnual Meeting of the Association for Computational Linguistics (ACL), 2025

421

16 Mar 2025

Seesaw: High-throughput LLM Inference via Model Re-sharding

Qidong Su

Wei Zhao

Xuelong Li

Muralidhar Andoorveedu

366

09 Mar 2025

APB: Accelerating Distributed Long-Context Inference by Passing Compressed Context Blocks across GPUsAnnual Meeting of the Association for Computational Linguistics (ACL), 2025

386

17 Feb 2025

LongReason: A Synthetic Long-Context Reasoning Benchmark via Context Expansion

437

25 Jan 2025

A Survey on Memory-Efficient Transformer-Based Model Training in AI for Science

375

21 Jan 2025

TokenRing: An Efficient Parallelism Framework for Infinite-Context LLMs via Bidirectional Communication

310

29 Dec 2024

FlexSP: Accelerating Large Language Model Training via Flexible Sequence ParallelismInternational Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), 2024

391

02 Dec 2024

When Precision Meets Position: BFloat16 Breaks Down RoPE in Long-Context Training

469

20 Nov 2024

Hardware Scaling Trends and Diminishing Returns in Large-Scale Distributed Training

1.2K

20 Nov 2024

Context Parallelism for Scalable Million-Token Inference

473

04 Nov 2024

Reducing the Cost of Dropout in Flash-Attention by Hiding RNG with GEMM

Haiyue Ma

Jian Liu

Ronny Krashinsky

226

10 Oct 2024

FltLM: An Intergrated Long-Context Large Language Model for Effective Context Filtering and UnderstandingEuropean Conference on Artificial Intelligence (ECAI), 2024

Zhengyang Shen

Jinwen Ma

166

09 Oct 2024

How to Train Long-Context Language Models (Effectively)Annual Meeting of the Association for Computational Linguistics (ACL), 2024

665

03 Oct 2024

No Request Left Behind: Tackling Heterogeneity in Long-Context LLM Inference with Medha

560

25 Sep 2024

PecSched: Preemptive and Efficient Cluster Scheduling for LLM Inference

Zeyu Zhang

Haiying Shen

VLM

349

23 Sep 2024

Achieving Peak Performance for Large Language Models: A Systematic ReviewIEEE Access (IEEE Access), 2024

Z. R. K. Rostam

Sándor Szénási

Gábor Kertész

321

07 Sep 2024

Kraken: Inherently Parallel Transformers For Efficient Multi-Device InferenceNeural Information Processing Systems (NeurIPS), 2024

R. Prabhakar

Hengrui Zhang

D. Wentzlaff

294

14 Aug 2024

Efficient Training of Large Language Models on Distributed Infrastructures: A Survey

...

Dahua Lin

Yonggang Wen

Xin Jin

Tianwei Zhang

Yang Liu

369

29 Jul 2024

MINI-SEQUENCE TRANSFORMER: Optimizing Intermediate Memory for Long Sequences Training

265

22 Jul 2024

TorchGT: A Holistic System for Large-scale Graph Transformer Training

236

19 Jul 2024

Scaling Granite Code Models to 128K Context

...

266

18 Jul 2024

Inference Optimization of Foundation Models on AI Accelerators

Matthäus Kleindessner

313

12 Jul 2024

WallFacer: Guiding Transformer Model Training Out of the Long-Context Dark Forest with N-body Problem

James Demmel

219

30 Jun 2024

A Survey on Mixture of Experts in Large Language Models

477

26 Jun 2024

Long Context Transfer from Language to Vision

Peiyuan Zhang

Kaichen Zhang

Bo Li

Guangtao Zeng

Chunyuan Li

Ziwei Liu

VLM

315

349

24 Jun 2024

Mooncake: A KVCache-centric Disaggregated Architecture for LLM Serving

702

120

24 Jun 2024