Diagonal State Spaces are as Effective as Structured State Spaces

27 March 2022

Papers citing "Diagonal State Spaces are as Effective as Structured State Spaces"

26 / 226 papers shown

Title
Sequence Modeling with Multiresolution Convolutional Memory Jiaxin Shi Ke Alexander Wang E. Fox 31 13 0 02 May 2023
Long-term Forecasting with TiDE: Time-series Dense Encoder Abhimanyu Das Weihao Kong Andrew B. Leach Shaan Mathur Rajat Sen Rose Yu AI4TS 31 232 0 17 Apr 2023
Effectively Modeling Time Series with Simple Discrete State Spaces Michael Zhang Khaled Kamal Saab Michael Poli Tri Dao Karan Goel Christopher Ré AI4TS 14 30 0 16 Mar 2023
Resurrecting Recurrent Neural Networks for Long Sequences Antonio Orvieto Samuel L. Smith Albert Gu Anushan Fernando Çağlar Gülçehre Razvan Pascanu Soham De 83 258 0 11 Mar 2023
Diagonal State Space Augmented Transformers for Speech Recognition G. Saon Ankit Gupta Xiaodong Cui AI4TS 14 26 0 27 Feb 2023
Hyena Hierarchy: Towards Larger Convolutional Language Models Michael Poli Stefano Massaroli Eric Q. Nguyen Daniel Y. Fu Tri Dao S. Baccus Yoshua Bengio Stefano Ermon Christopher Ré VLM 12 276 0 21 Feb 2023
Simple Hardware-Efficient Long Convolutions for Sequence Modeling Daniel Y. Fu Elliot L. Epstein Eric N. D. Nguyen A. Thomas Michael Zhang Tri Dao Atri Rudra Christopher Ré 11 51 0 13 Feb 2023
In-Context Learning with Many Demonstration Examples Mukai Li Shansan Gong Jiangtao Feng Yiheng Xu Jinchao Zhang Zhiyong Wu Lingpeng Kong 30 26 0 09 Feb 2023
Finding the Law: Enhancing Statutory Article Retrieval via Graph Neural Networks Antoine Louis Gijs van Dijck Gerasimos Spanakis AILaw 21 9 0 30 Jan 2023
Efficient Movie Scene Detection using State-Space Transformers Md. Mohaiminul Islam Mahmudul Hasan Kishan Athrey Tony Braskich Gedas Bertasius ViT 23 44 0 29 Dec 2022
Hungry Hungry Hippos: Towards Language Modeling with State Space Models Daniel Y. Fu Tri Dao Khaled Kamal Saab A. Thomas Atri Rudra Christopher Ré 20 238 0 28 Dec 2022
Pretraining Without Attention Junxiong Wang J. Yan Albert Gu Alexander M. Rush 17 48 0 20 Dec 2022
Simplifying and Understanding State Space Models with Diagonal Linear RNNs Ankit Gupta Harsh Mehta Jonathan Berant 19 21 0 01 Dec 2022
Learning Low Dimensional State Spaces with Overparameterized Recurrent Neural Nets Edo Cohen-Karlik Itamar Menuhin-Gruman Raja Giryes Nadav Cohen Amir Globerson 8 4 0 25 Oct 2022
CAB: Comprehensive Attention Benchmarking on Long Sequence Modeling Jinchao Zhang Shuyang Jiang Jiangtao Feng Lin Zheng Lingpeng Kong 3DV 37 9 0 14 Oct 2022
S4ND: Modeling Images and Videos as Multidimensional Signals Using State Spaces Eric N. D. Nguyen Karan Goel Albert Gu Gordon W. Downs Preey Shah Tri Dao S. Baccus Christopher Ré VLM 8 24 0 12 Oct 2022
An Exploration of Hierarchical Attention Transformers for Efficient Long Document Classification Ilias Chalkidis Xiang Dai Manos Fergadiotis Prodromos Malakasiotis Desmond Elliott 30 33 0 11 Oct 2022
Liquid Structural State-Space Models Ramin Hasani Mathias Lechner Tsun-Hsuan Wang Makram Chahine Alexander Amini Daniela Rus AI4TS 92 93 0 26 Sep 2022
Efficient Methods for Natural Language Processing: A Survey Marcos Vinícius Treviso Ji-Ung Lee Tianchu Ji Betty van Aken Qingqing Cao ... Emma Strubell Niranjan Balasubramanian Leon Derczynski Iryna Gurevych Roy Schwartz 20 105 0 31 Aug 2022
Efficient Long-Text Understanding with Short-Text Models Maor Ivgi Uri Shaham Jonathan Berant VLM 11 75 0 01 Aug 2022
Long Range Language Modeling via Gated State Spaces Harsh Mehta Ankit Gupta Ashok Cutkosky Behnam Neyshabur Mamba 21 229 0 27 Jun 2022
How to Train Your HiPPO: State Space Models with Generalized Orthogonal Basis Projections Albert Gu Isys Johnson Aman Timalsina Atri Rudra Christopher Ré Mamba 87 88 0 24 Jun 2022
On the Parameterization and Initialization of Diagonal State Space Models Albert Gu Ankit Gupta Karan Goel Christopher Ré 12 292 0 23 Jun 2022
H-Transformer-1D: Fast One-Dimensional Hierarchical Attention for Sequences Zhenhai Zhu Radu Soricut 95 41 0 25 Jul 2021
Zero-Shot Text-to-Image Generation Aditya A. Ramesh Mikhail Pavlov Gabriel Goh Scott Gray Chelsea Voss Alec Radford Mark Chen Ilya Sutskever VLM 253 4,735 0 24 Feb 2021
Efficient Content-Based Sparse Attention with Routing Transformers Aurko Roy M. Saffar Ashish Vaswani David Grangier MoE 228 502 0 12 Mar 2020