Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2407.15892
Cited By
MINI-SEQUENCE TRANSFORMER: Optimizing Intermediate Memory for Long Sequences Training
22 July 2024
Cheng Luo
Jiawei Zhao
Zhuoming Chen
Beidi Chen
A. Anandkumar
Re-assign community
ArXiv
PDF
HTML
Papers citing
"MINI-SEQUENCE TRANSFORMER: Optimizing Intermediate Memory for Long Sequences Training"
7 / 7 papers shown
Title
MOM: Memory-Efficient Offloaded Mini-Sequence Inference for Long Context Language Models
Junyang Zhang
Tianyi Zhu
Cheng Luo
A. Anandkumar
RALM
42
0
0
16 Apr 2025
Training Ultra Long Context Language Model with Fully Pipelined Distributed Transformer
Jinghan Yao
Sam Ade Jacobs
Masahiro Tanaka
Olatunji Ruwase
A. Shafi
D. Panda
28
2
0
30 Aug 2024
EDGAR-CORPUS: Billions of Tokens Make The World Go Round
Lefteris Loukas
Manos Fergadiotis
Ion Androutsopoulos
Prodromos Malakasiotis
AIFin
71
29
0
29 Sep 2021
Big Bird: Transformers for Longer Sequences
Manzil Zaheer
Guru Guruganesh
Kumar Avinava Dubey
Joshua Ainslie
Chris Alberti
...
Philip Pham
Anirudh Ravula
Qifan Wang
Li Yang
Amr Ahmed
VLM
249
1,982
0
28 Jul 2020
Efficient Content-Based Sparse Attention with Routing Transformers
Aurko Roy
M. Saffar
Ashish Vaswani
David Grangier
MoE
234
578
0
12 Mar 2020
Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism
M. Shoeybi
M. Patwary
Raul Puri
P. LeGresley
Jared Casper
Bryan Catanzaro
MoE
243
1,791
0
17 Sep 2019
Optimal Distributed Online Prediction using Mini-Batches
O. Dekel
Ran Gilad-Bachrach
Ohad Shamir
Lin Xiao
164
684
0
07 Dec 2010
1