TaylorShift: Shifting the Complexity of Self-Attention from Squared to Linear (and Back) using Taylor-Softmax

5 March 2024

Papers citing "TaylorShift: Shifting the Complexity of Self-Attention from Squared to Linear (and Back) using Taylor-Softmax"

3 / 3 papers shown

Title
Understanding the differences in Foundation Models: Attention, State Space Models, and Recurrent Neural Networks Jerome Sieber Carmen Amo Alonso A. Didier M. Zeilinger Antonio Orvieto AAML 31 7 0 24 May 2024
On The Computational Complexity of Self-Attention Feyza Duman Keles Pruthuvi Maheshakya Wijewardena C. Hegde 39 107 0 11 Sep 2022
Big Bird: Transformers for Longer Sequences Manzil Zaheer Guru Guruganesh Kumar Avinava Dubey Joshua Ainslie Chris Alberti ... Philip Pham Anirudh Ravula Qifan Wang Li Yang Amr Ahmed VLM 246 1,982 0 28 Jul 2020