ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2212.14052
  4. Cited By
Hungry Hungry Hippos: Towards Language Modeling with State Space Models

Hungry Hungry Hippos: Towards Language Modeling with State Space Models

28 December 2022
Daniel Y. Fu
Tri Dao
Khaled Kamal Saab
A. Thomas
Atri Rudra
Christopher Ré
ArXivPDFHTML

Papers citing "Hungry Hungry Hippos: Towards Language Modeling with State Space Models"

34 / 284 papers shown
Title
Gated recurrent neural networks discover attention
Gated recurrent neural networks discover attention
Nicolas Zucchet
Seijin Kobayashi
Yassir Akram
J. Oswald
Maxime Larcher
Angelika Steger
João Sacramento
23
8
0
04 Sep 2023
Sparks of Large Audio Models: A Survey and Outlook
Sparks of Large Audio Models: A Survey and Outlook
S. Latif
Moazzam Shoukat
Fahad Shamshad
Muhammad Usama
Yi Ren
...
Wenwu Wang
Xulong Zhang
Roberto Togneri
Erik Cambria
Björn W. Schuller
LM&MA
AuLLM
27
36
0
24 Aug 2023
L-Eval: Instituting Standardized Evaluation for Long Context Language
  Models
L-Eval: Instituting Standardized Evaluation for Long Context Language Models
Chen An
Shansan Gong
Ming Zhong
Xingjian Zhao
Mukai Li
Jun Zhang
Lingpeng Kong
Xipeng Qiu
ELM
ALM
30
132
0
20 Jul 2023
Retentive Network: A Successor to Transformer for Large Language Models
Retentive Network: A Successor to Transformer for Large Language Models
Yutao Sun
Li Dong
Shaohan Huang
Shuming Ma
Yuqing Xia
Jilong Xue
Jianyong Wang
Furu Wei
LRM
34
300
0
17 Jul 2023
Lost in the Middle: How Language Models Use Long Contexts
Lost in the Middle: How Language Models Use Long Contexts
Nelson F. Liu
Kevin Lin
John Hewitt
Ashwin Paranjape
Michele Bevilacqua
Fabio Petroni
Percy Liang
RALM
27
1,380
0
06 Jul 2023
LongNet: Scaling Transformers to 1,000,000,000 Tokens
LongNet: Scaling Transformers to 1,000,000,000 Tokens
Jiayu Ding
Shuming Ma
Li Dong
Xingxing Zhang
Shaohan Huang
Wenhui Wang
Nanning Zheng
Furu Wei
CLL
21
149
0
05 Jul 2023
Facing Off World Model Backbones: RNNs, Transformers, and S4
Facing Off World Model Backbones: RNNs, Transformers, and S4
Fei Deng
Junyeong Park
Sungjin Ahn
19
24
0
05 Jul 2023
Hyena Neural Operator for Partial Differential Equations
Hyena Neural Operator for Partial Differential Equations
Saurabh Patil
Zijie Li
Amir Barati Farimani
AI4CE
17
4
0
28 Jun 2023
HyenaDNA: Long-Range Genomic Sequence Modeling at Single Nucleotide
  Resolution
HyenaDNA: Long-Range Genomic Sequence Modeling at Single Nucleotide Resolution
Eric N. D. Nguyen
Michael Poli
Marjan Faizi
A. Thomas
Callum Birch-Sykes
...
Stefano Massaroli
Yoshua Bengio
Stefano Ermon
S. Baccus
Christopher Ré
MedIm
4
212
0
27 Jun 2023
Long-range Language Modeling with Self-retrieval
Long-range Language Modeling with Self-retrieval
Ohad Rubin
Jonathan Berant
RALM
KELM
14
18
0
23 Jun 2023
Sparse Modular Activation for Efficient Sequence Modeling
Sparse Modular Activation for Efficient Sequence Modeling
Liliang Ren
Yang Liu
Shuohang Wang
Yichong Xu
Chenguang Zhu
Chengxiang Zhai
43
13
0
19 Jun 2023
Block-State Transformers
Block-State Transformers
Mahan Fathi
Jonathan Pilault
Orhan Firat
C. Pal
Pierre-Luc Bacon
Ross Goroshin
17
17
0
15 Jun 2023
2-D SSM: A General Spatial Layer for Visual Transformers
2-D SSM: A General Spatial Layer for Visual Transformers
Ethan Baron
Itamar Zimerman
Lior Wolf
20
14
0
11 Jun 2023
Exposing Attention Glitches with Flip-Flop Language Modeling
Exposing Attention Glitches with Flip-Flop Language Modeling
Bingbin Liu
Jordan T. Ash
Surbhi Goel
A. Krishnamurthy
Cyril Zhang
LRM
24
46
0
01 Jun 2023
Online learning of long-range dependencies
Online learning of long-range dependencies
Nicolas Zucchet
Robert Meier
Simon Schug
Asier Mujika
João Sacramento
CLL
36
18
0
25 May 2023
Focus Your Attention (with Adaptive IIR Filters)
Focus Your Attention (with Adaptive IIR Filters)
Shahar Lutati
Itamar Zimerman
Lior Wolf
21
9
0
24 May 2023
Neural Machine Translation for Code Generation
Neural Machine Translation for Code Generation
K. Dharma
Clayton T. Morrison
25
4
0
22 May 2023
RWKV: Reinventing RNNs for the Transformer Era
RWKV: Reinventing RNNs for the Transformer Era
Bo Peng
Eric Alcaide
Quentin G. Anthony
Alon Albalak
Samuel Arcadinho
...
Qihang Zhao
P. Zhou
Qinghua Zhou
Jian Zhu
Rui-Jie Zhu
72
550
0
22 May 2023
SKI to go Faster: Accelerating Toeplitz Neural Networks via Asymmetric
  Kernels
SKI to go Faster: Accelerating Toeplitz Neural Networks via Asymmetric Kernels
Alexander Moreno
Jonathan Mei
Luke Walters
8
0
0
15 May 2023
BranchNorm: Robustly Scaling Extremely Deep Transformers
BranchNorm: Robustly Scaling Extremely Deep Transformers
Yanjun Liu
Xianfeng Zeng
Fandong Meng
Jie Zhou
19
3
0
04 May 2023
Towards Automated Circuit Discovery for Mechanistic Interpretability
Towards Automated Circuit Discovery for Mechanistic Interpretability
Arthur Conmy
Augustine N. Mavor-Parker
Aengus Lynch
Stefan Heimersheim
Adrià Garriga-Alonso
11
273
0
28 Apr 2023
State Spaces Aren't Enough: Machine Translation Needs Attention
State Spaces Aren't Enough: Machine Translation Needs Attention
Ali Vardasbi
Telmo Pires
Robin M. Schmidt
Stephan Peitz
8
9
0
25 Apr 2023
The MiniPile Challenge for Data-Efficient Language Models
The MiniPile Challenge for Data-Efficient Language Models
Jean Kaddour
MoE
ALM
10
41
0
17 Apr 2023
Resurrecting Recurrent Neural Networks for Long Sequences
Resurrecting Recurrent Neural Networks for Long Sequences
Antonio Orvieto
Samuel L. Smith
Albert Gu
Anushan Fernando
Çağlar Gülçehre
Razvan Pascanu
Soham De
88
258
0
11 Mar 2023
Hyena Hierarchy: Towards Larger Convolutional Language Models
Hyena Hierarchy: Towards Larger Convolutional Language Models
Michael Poli
Stefano Massaroli
Eric Q. Nguyen
Daniel Y. Fu
Tri Dao
S. Baccus
Yoshua Bengio
Stefano Ermon
Christopher Ré
VLM
17
276
0
21 Feb 2023
Simple Hardware-Efficient Long Convolutions for Sequence Modeling
Simple Hardware-Efficient Long Convolutions for Sequence Modeling
Daniel Y. Fu
Elliot L. Epstein
Eric N. D. Nguyen
A. Thomas
Michael Zhang
Tri Dao
Atri Rudra
Christopher Ré
11
51
0
13 Feb 2023
In-Context Learning with Many Demonstration Examples
In-Context Learning with Many Demonstration Examples
Mukai Li
Shansan Gong
Jiangtao Feng
Yiheng Xu
Jinchao Zhang
Zhiyong Wu
Lingpeng Kong
32
32
0
09 Feb 2023
Pretraining Without Attention
Pretraining Without Attention
Junxiong Wang
J. Yan
Albert Gu
Alexander M. Rush
19
48
0
20 Dec 2022
BLOOM: A 176B-Parameter Open-Access Multilingual Language Model
BLOOM: A 176B-Parameter Open-Access Multilingual Language Model
BigScience Workshop
:
Teven Le Scao
Angela Fan
Christopher Akiki
...
Zhongli Xie
Zifan Ye
M. Bras
Younes Belkada
Thomas Wolf
VLM
27
2,297
0
09 Nov 2022
In-context Learning and Induction Heads
In-context Learning and Induction Heads
Catherine Olsson
Nelson Elhage
Neel Nanda
Nicholas Joseph
Nova Dassarma
...
Tom B. Brown
Jack Clark
Jared Kaplan
Sam McCandlish
C. Olah
240
453
0
24 Sep 2022
On The Computational Complexity of Self-Attention
On The Computational Complexity of Self-Attention
Feyza Duman Keles
Pruthuvi Maheshakya Wijewardena
C. Hegde
60
107
0
11 Sep 2022
How to Train Your HiPPO: State Space Models with Generalized Orthogonal
  Basis Projections
How to Train Your HiPPO: State Space Models with Generalized Orthogonal Basis Projections
Albert Gu
Isys Johnson
Aman Timalsina
Atri Rudra
Christopher Ré
Mamba
93
88
0
24 Jun 2022
The Pile: An 800GB Dataset of Diverse Text for Language Modeling
The Pile: An 800GB Dataset of Diverse Text for Language Modeling
Leo Gao
Stella Biderman
Sid Black
Laurence Golding
Travis Hoppe
...
Horace He
Anish Thite
Noa Nabeshima
Shawn Presser
Connor Leahy
AIMat
245
1,977
0
31 Dec 2020
Fine-grain atlases of functional modes for fMRI analysis
Fine-grain atlases of functional modes for fMRI analysis
Kamalaker Dadi
Gaël Varoquaux
Antonia Machlouzarides-Shalit
Krzysztof J. Gorgolewski
Demian Wassermann
B. Thirion
A. Mensch
AI4CE
18
87
0
05 Mar 2020
Previous
123456