MINI-SEQUENCE TRANSFORMER: Optimizing Intermediate Memory for Long Sequences Training

22 July 2024

Papers citing "MINI-SEQUENCE TRANSFORMER: Optimizing Intermediate Memory for Long Sequences Training"

7 / 7 papers shown

Title
MOM: Memory-Efficient Offloaded Mini-Sequence Inference for Long Context Language Models Junyang Zhang Tianyi Zhu Cheng Luo A. Anandkumar RALM 42 0 0 16 Apr 2025
Training Ultra Long Context Language Model with Fully Pipelined Distributed Transformer Jinghan Yao Sam Ade Jacobs Masahiro Tanaka Olatunji Ruwase A. Shafi D. Panda 28 2 0 30 Aug 2024
EDGAR-CORPUS: Billions of Tokens Make The World Go Round Lefteris Loukas Manos Fergadiotis Ion Androutsopoulos Prodromos Malakasiotis AIFin 71 29 0 29 Sep 2021
Big Bird: Transformers for Longer Sequences Manzil Zaheer Guru Guruganesh Kumar Avinava Dubey Joshua Ainslie Chris Alberti ... Philip Pham Anirudh Ravula Qifan Wang Li Yang Amr Ahmed VLM 249 1,982 0 28 Jul 2020
Efficient Content-Based Sparse Attention with Routing Transformers Aurko Roy M. Saffar Ashish Vaswani David Grangier MoE 234 578 0 12 Mar 2020
Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism M. Shoeybi M. Patwary Raul Puri P. LeGresley Jared Casper Bryan Catanzaro MoE 243 1,791 0 17 Sep 2019
Optimal Distributed Online Prediction using Mini-Batches O. Dekel Ran Gilad-Bachrach Ohad Shamir Lin Xiao 164 684 0 07 Dec 2010