ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2212.14052
  4. Cited By
Hungry Hungry Hippos: Towards Language Modeling with State Space Models

Hungry Hungry Hippos: Towards Language Modeling with State Space Models

28 December 2022
Daniel Y. Fu
Tri Dao
Khaled Kamal Saab
A. Thomas
Atri Rudra
Christopher Ré
ArXivPDFHTML

Papers citing "Hungry Hungry Hippos: Towards Language Modeling with State Space Models"

50 / 284 papers shown
Title
Orchid: Flexible and Data-Dependent Convolution for Sequence Modeling
Orchid: Flexible and Data-Dependent Convolution for Sequence Modeling
Mahdi Karami
Ali Ghodsi
VLM
31
6
0
28 Feb 2024
Latent Attention for Linear Time Transformers
Latent Attention for Linear Time Transformers
Rares Dolga
Marius Cobzarenco
David Barber
18
1
0
27 Feb 2024
Model Compression Method for S4 with Diagonal State Space Layers using
  Balanced Truncation
Model Compression Method for S4 with Diagonal State Space Layers using Balanced Truncation
Haruka Ezoe
Kazuhiro Sato
23
0
0
25 Feb 2024
Hierarchical State Space Models for Continuous Sequence-to-Sequence
  Modeling
Hierarchical State Space Models for Continuous Sequence-to-Sequence Modeling
Raunaq M. Bhirangi
Chenyu Wang
Venkatesh Pattabiraman
Carmel Majidi
Abhinav Gupta
Tess Hellebrekers
Lerrel Pinto
48
10
0
15 Feb 2024
Hidden Traveling Waves bind Working Memory Variables in Recurrent Neural
  Networks
Hidden Traveling Waves bind Working Memory Variables in Recurrent Neural Networks
Arjun Karuvally
T. Sejnowski
H. Siegelmann
14
3
0
15 Feb 2024
Get More with LESS: Synthesizing Recurrence with KV Cache Compression
  for Efficient LLM Inference
Get More with LESS: Synthesizing Recurrence with KV Cache Compression for Efficient LLM Inference
Harry Dong
Xinyu Yang
Zhenyu (Allen) Zhang
Zhangyang Wang
Yuejie Chi
Beidi Chen
27
47
0
14 Feb 2024
Graph Mamba: Towards Learning on Graphs with State Space Models
Graph Mamba: Towards Learning on Graphs with State Space Models
Ali Behrouz
Farnoosh Hashemi
AI4CE
104
57
0
13 Feb 2024
Benchmarking and Building Long-Context Retrieval Models with LoCo and
  M2-BERT
Benchmarking and Building Long-Context Retrieval Models with LoCo and M2-BERT
Jon Saad-Falcon
Daniel Y. Fu
Simran Arora
Neel Guha
Christopher Ré
RALM
22
15
0
12 Feb 2024
Semi-Mamba-UNet: Pixel-Level Contrastive and Pixel-Level
  Cross-Supervised Visual Mamba-based UNet for Semi-Supervised Medical Image
  Segmentation
Semi-Mamba-UNet: Pixel-Level Contrastive and Pixel-Level Cross-Supervised Visual Mamba-based UNet for Semi-Supervised Medical Image Segmentation
Chao Ma
Ziyang Wang
Mamba
43
21
0
11 Feb 2024
Limits of Transformer Language Models on Learning to Compose Algorithms
Limits of Transformer Language Models on Learning to Compose Algorithms
Jonathan Thomm
Aleksandar Terzić
Giacomo Camposampiero
Michael Hersche
Bernhard Schölkopf
Abbas Rahimi
34
3
0
08 Feb 2024
The Hedgehog & the Porcupine: Expressive Linear Attentions with Softmax
  Mimicry
The Hedgehog & the Porcupine: Expressive Linear Attentions with Softmax Mimicry
Michael Zhang
Kush S. Bhatia
Hermann Kumbong
Christopher Ré
14
47
0
06 Feb 2024
Can Mamba Learn How to Learn? A Comparative Study on In-Context Learning
  Tasks
Can Mamba Learn How to Learn? A Comparative Study on In-Context Learning Tasks
Jongho Park
Jaeseung Park
Zheyang Xiong
Nayoung Lee
Jaewoong Cho
Samet Oymak
Kangwook Lee
Dimitris Papailiopoulos
19
69
0
06 Feb 2024
Is Mamba Capable of In-Context Learning?
Is Mamba Capable of In-Context Learning?
Riccardo Grazzi
Julien N. Siems
Simon Schrodi
Thomas Brox
Frank Hutter
24
40
0
05 Feb 2024
A Survey on Transformer Compression
A Survey on Transformer Compression
Yehui Tang
Yunhe Wang
Jianyuan Guo
Zhijun Tu
Kai Han
Hailin Hu
Dacheng Tao
29
26
0
05 Feb 2024
Enhancing Transformer RNNs with Multiple Temporal Perspectives
Enhancing Transformer RNNs with Multiple Temporal Perspectives
Razvan-Gabriel Dumitru
Darius Peteleaza
Mihai Surdeanu
AI4TS
6
2
0
04 Feb 2024
Graph-Mamba: Towards Long-Range Graph Sequence Modeling with Selective
  State Spaces
Graph-Mamba: Towards Long-Range Graph Sequence Modeling with Selective State Spaces
Chloe X. Wang
Oleksii Tsepa
Jun Ma
Bo Wang
Mamba
25
85
0
01 Feb 2024
Scavenging Hyena: Distilling Transformers into Long Convolution Models
Scavenging Hyena: Distilling Transformers into Long Convolution Models
Tokiniaina Raharison Ralambomihanta
Shahrad Mohammadzadeh
Mohammad Sami Nur Islam
Wassim Jabbour
Laurence Liang
13
3
0
31 Jan 2024
Transformers and Cortical Waves: Encoders for Pulling In Context Across
  Time
Transformers and Cortical Waves: Encoders for Pulling In Context Across Time
L. Muller
P. Churchland
T. Sejnowski
19
6
0
25 Jan 2024
MambaByte: Token-free Selective State Space Model
MambaByte: Token-free Selective State Space Model
Junxiong Wang
Tushaar Gangavarapu
Jing Nathan Yan
Alexander M. Rush
Mamba
20
34
0
24 Jan 2024
In-Context Language Learning: Architectures and Algorithms
In-Context Language Learning: Architectures and Algorithms
Ekin Akyürek
Bailin Wang
Yoon Kim
Jacob Andreas
LRM
ReLM
40
40
0
23 Jan 2024
Vision Mamba: Efficient Visual Representation Learning with
  Bidirectional State Space Model
Vision Mamba: Efficient Visual Representation Learning with Bidirectional State Space Model
Lianghui Zhu
Bencheng Liao
Qian Zhang
Xinlong Wang
Wenyu Liu
Xinggang Wang
Mamba
32
699
0
17 Jan 2024
MambaTab: A Plug-and-Play Model for Learning Tabular Data
MambaTab: A Plug-and-Play Model for Learning Tabular Data
Md. Atik Ahamed
Qiang Cheng
Mamba
LMTD
23
12
0
16 Jan 2024
SpiNNaker2: A Large-Scale Neuromorphic System for Event-Based and
  Asynchronous Machine Learning
SpiNNaker2: A Large-Scale Neuromorphic System for Event-Based and Asynchronous Machine Learning
Hector A. Gonzalez
Jiaxin Huang
Florian Kelber
Khaleelulla Khan Nazeer
Tim Langer
...
Bernhard Vogginger
Timo C. Wunderlich
Yexin Yan
Mahmoud Akl
Christian Mayr
19
15
0
09 Jan 2024
Towards Efficient Generative Large Language Model Serving: A Survey from
  Algorithms to Systems
Towards Efficient Generative Large Language Model Serving: A Survey from Algorithms to Systems
Xupeng Miao
Gabriele Oliaro
Zhihao Zhang
Xinhao Cheng
Hongyi Jin
Tianqi Chen
Zhihao Jia
53
75
0
23 Dec 2023
Efficiency-oriented approaches for self-supervised speech representation
  learning
Efficiency-oriented approaches for self-supervised speech representation learning
Luis Lugo
Valentin Vielzeuf
SSL
19
1
0
18 Dec 2023
A Survey of Reasoning with Foundation Models
A Survey of Reasoning with Foundation Models
Jiankai Sun
Chuanyang Zheng
E. Xie
Zhengying Liu
Ruihang Chu
...
Xipeng Qiu
Yi-Chen Guo
Hui Xiong
Qun Liu
Zhenguo Li
ReLM
LRM
AI4CE
22
74
0
17 Dec 2023
Learning Long Sequences in Spiking Neural Networks
Learning Long Sequences in Spiking Neural Networks
Matei Ioan Stan
Oliver Rhodes
30
10
0
14 Dec 2023
Spectral State Space Models
Spectral State Space Models
Naman Agarwal
Daniel Suo
Xinyi Chen
Elad Hazan
17
11
0
11 Dec 2023
Gated Linear Attention Transformers with Hardware-Efficient Training
Gated Linear Attention Transformers with Hardware-Efficient Training
Songlin Yang
Bailin Wang
Yikang Shen
Rameswar Panda
Yoon Kim
40
138
0
11 Dec 2023
TCNCA: Temporal Convolution Network with Chunked Attention for Scalable
  Sequence Processing
TCNCA: Temporal Convolution Network with Chunked Attention for Scalable Sequence Processing
Aleksandar Terzić
Michael Hersche
G. Karunaratne
Zixiao Huang
Abu Sebastian
Abbas Rahimi
AI4TS
12
1
0
09 Dec 2023
MIMONets: Multiple-Input-Multiple-Output Neural Networks Exploiting
  Computation in Superposition
MIMONets: Multiple-Input-Multiple-Output Neural Networks Exploiting Computation in Superposition
Nicolas Menet
Michael Hersche
G. Karunaratne
Luca Benini
Abu Sebastian
Abbas Rahimi
20
13
0
05 Dec 2023
Recurrent Distance Filtering for Graph Representation Learning
Recurrent Distance Filtering for Graph Representation Learning
Yuhui Ding
Antonio Orvieto
Bobby He
Thomas Hofmann
GNN
27
6
0
03 Dec 2023
The Efficiency Spectrum of Large Language Models: An Algorithmic Survey
The Efficiency Spectrum of Large Language Models: An Algorithmic Survey
Tianyu Ding
Tianyi Chen
Haidong Zhu
Jiachen Jiang
Yiqi Zhong
Jinxin Zhou
Guangzhi Wang
Zhihui Zhu
Ilya Zharkov
Luming Liang
27
21
0
01 Dec 2023
Diffusion Models Without Attention
Diffusion Models Without Attention
Jing Nathan Yan
Jiatao Gu
Alexander M. Rush
19
60
0
30 Nov 2023
On the Long Range Abilities of Transformers
On the Long Range Abilities of Transformers
Itamar Zimerman
Lior Wolf
19
7
0
28 Nov 2023
StableSSM: Alleviating the Curse of Memory in State-space Models through
  Stable Reparameterization
StableSSM: Alleviating the Curse of Memory in State-space Models through Stable Reparameterization
Shida Wang
Qianxiao Li
17
12
0
24 Nov 2023
Long-MIL: Scaling Long Contextual Multiple Instance Learning for
  Histopathology Whole Slide Image Analysis
Long-MIL: Scaling Long Contextual Multiple Instance Learning for Histopathology Whole Slide Image Analysis
Honglin Li
Yunlong Zhang
Chenglu Zhu
Jiatong Cai
Sunyi Zheng
Lin Yang
VLM
25
4
0
21 Nov 2023
Activity Sparsity Complements Weight Sparsity for Efficient RNN
  Inference
Activity Sparsity Complements Weight Sparsity for Efficient RNN Inference
Rishav Mukherji
Mark Schöne
Khaleelulla Khan Nazeer
Christian Mayr
Anand Subramoney
17
2
0
13 Nov 2023
FlashFFTConv: Efficient Convolutions for Long Sequences with Tensor
  Cores
FlashFFTConv: Efficient Convolutions for Long Sequences with Tensor Cores
Daniel Y. Fu
Hermann Kumbong
Eric N. D. Nguyen
Christopher Ré
VLM
28
28
0
10 Nov 2023
Hierarchically Gated Recurrent Neural Network for Sequence Modeling
Hierarchically Gated Recurrent Neural Network for Sequence Modeling
Zhen Qin
Songlin Yang
Yiran Zhong
36
72
0
08 Nov 2023
Laughing Hyena Distillery: Extracting Compact Recurrences From
  Convolutions
Laughing Hyena Distillery: Extracting Compact Recurrences From Convolutions
Stefano Massaroli
Michael Poli
Daniel Y. Fu
Hermann Kumbong
Rom N. Parnichkun
...
Atri Rudra
Ce Zhang
Christopher Ré
Stefano Ermon
Yoshua Bengio
29
19
0
28 Oct 2023
Hieros: Hierarchical Imagination on Structured State Space Sequence
  World Models
Hieros: Hierarchical Imagination on Structured State Space Sequence World Models
Paul Mattes
Rainer Schlosser
R. Herbrich
16
4
0
08 Oct 2023
Parameter-Saving Adversarial Training: Reinforcing Multi-Perturbation
  Robustness via Hypernetworks
Parameter-Saving Adversarial Training: Reinforcing Multi-Perturbation Robustness via Hypernetworks
Huihui Gong
Minjing Dong
Siqi Ma
S. Çamtepe
Surya Nepal
Chang Xu
AAML
OOD
13
1
0
28 Sep 2023
Multi-Dimensional Hyena for Spatial Inductive Bias
Multi-Dimensional Hyena for Spatial Inductive Bias
Itamar Zimerman
Lior Wolf
ViT
20
4
0
24 Sep 2023
State-space Models with Layer-wise Nonlinearity are Universal
  Approximators with Exponential Decaying Memory
State-space Models with Layer-wise Nonlinearity are Universal Approximators with Exponential Decaying Memory
Shida Wang
Beichen Xue
11
22
0
23 Sep 2023
The Languini Kitchen: Enabling Language Modelling Research at Different
  Scales of Compute
The Languini Kitchen: Enabling Language Modelling Research at Different Scales of Compute
Aleksandar Stanić
Dylan R. Ashley
Oleg Serikov
Louis Kirsch
Francesco Faccio
Jürgen Schmidhuber
Thomas Hofmann
Imanol Schlag
MoE
38
9
0
20 Sep 2023
Augmenting conformers with structured state-space sequence models for
  online speech recognition
Augmenting conformers with structured state-space sequence models for online speech recognition
Haozhe Shan
Albert Gu
Zhong Meng
Weiran Wang
Krzysztof Choromanski
Tara N. Sainath
RALM
16
4
0
15 Sep 2023
Advancing Regular Language Reasoning in Linear Recurrent Neural Networks
Advancing Regular Language Reasoning in Linear Recurrent Neural Networks
Ting-Han Fan
Ta-Chung Chi
Alexander I. Rudnicky
LRM
22
5
0
14 Sep 2023
Uncovering mesa-optimization algorithms in Transformers
Uncovering mesa-optimization algorithms in Transformers
J. Oswald
Eyvind Niklasson
Maximilian Schlegel
Seijin Kobayashi
Nicolas Zucchet
...
Mark Sandler
Blaise Agüera y Arcas
Max Vladymyrov
Razvan Pascanu
João Sacramento
15
53
0
11 Sep 2023
CoLA: Exploiting Compositional Structure for Automatic and Efficient
  Numerical Linear Algebra
CoLA: Exploiting Compositional Structure for Automatic and Efficient Numerical Linear Algebra
Andres Potapczynski
Marc Finzi
Geoff Pleiss
Andrew Gordon Wilson
20
7
0
06 Sep 2023
Previous
123456
Next