Fine-Tuning Pre-trained Transformers into Decaying Fast Weights

9 October 2022

Papers citing "Fine-Tuning Pre-trained Transformers into Decaying Fast Weights"

5 / 5 papers shown

Title
Conformal Transformations for Symmetric Power Transformers Saurabh Kumar Jacob Buckman Carles Gelada Sean Zhang 60 0 0 05 Mar 2025
Joint Fine-tuning and Conversion of Pretrained Speech and Language Models towards Linear Complexity Mutian He Philip N. Garner 73 0 0 09 Oct 2024
Linear Attention Sequence Parallelism Weigao Sun Zhen Qin Dong Li Xuyang Shen Yu Qiao Yiran Zhong 53 2 0 03 Apr 2024
ABC: Attention with Bounded-memory Control Hao Peng Jungo Kasai Nikolaos Pappas Dani Yogatama Zhaofeng Wu Lingpeng Kong Roy Schwartz Noah A. Smith 43 21 0 06 Oct 2021
The Pile: An 800GB Dataset of Diverse Text for Language Modeling Leo Gao Stella Biderman Sid Black Laurence Golding Travis Hoppe ... Horace He Anish Thite Noa Nabeshima Shawn Presser Connor Leahy AIMat 231 1,508 0 31 Dec 2020