Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2410.11687
Cited By
v1
v2 (latest)
State-space models can learn in-context by gradient descent
15 October 2024
Neeraj Mohan Sushma
Yudou Tian
Harshvardhan Mestha
Nicolo Colombo
David Kappel
Anand Subramoney
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"State-space models can learn in-context by gradient descent"
26 / 26 papers shown
Title
From Markov to Laplace: How Mamba In-Context Learns Markov Chains
Marco Bondaschi
Nived Rajaraman
Xiuying Wei
Kannan Ramchandran
Razvan Pascanu
Çağlar Gülçehre
Michael C. Gastpar
Ashok Vardhan Makkuva
304
4
0
17 Feb 2025
Longhorn: State Space Models are Amortized Online Learners
Bo Liu
Rui Wang
Lemeng Wu
Yihao Feng
Peter Stone
Qian Liu
347
28
0
19 Jul 2024
Learning to (Learn at Test Time): RNNs with Expressive Hidden States
Yu Sun
Xinhao Li
Karan Dalal
Jiarui Xu
Arjun Vikram
...
Xinlei Chen
Xiaolong Wang
Sanmi Koyejo
Tatsunori Hashimoto
Carlos Guestrin
573
176
0
05 Jul 2024
Samba: Simple Hybrid State Space Models for Efficient Unlimited Context Language Modeling
Liliang Ren
Yang Liu
Yadong Lu
Haoran Pan
Chen Liang
Weizhu Chen
Mamba
315
109
0
11 Jun 2024
Transformers are SSMs: Generalized Models and Efficient Algorithms Through Structured State Space Duality
Tri Dao
Albert Gu
Mamba
377
998
0
31 May 2024
Griffin: Mixing Gated Linear Recurrences with Local Attention for Efficient Language Models
Soham De
Samuel L. Smith
Anushan Fernando
Aleksandar Botev
George-Christian Muraru
...
David Budden
Yee Whye Teh
Razvan Pascanu
Nando de Freitas
Çağlar Gülçehre
Mamba
239
185
0
29 Feb 2024
The dynamic interplay between in-context and in-weight learning in humans and neural networks
Jacob Russin
Ellie Pavlick
Michael J. Frank
229
4
0
13 Feb 2024
Is Mamba Capable of In-Context Learning?
Riccardo Grazzi
Julien N. Siems
Simon Schrodi
Thomas Brox
Frank Hutter
187
55
0
05 Feb 2024
Mamba: Linear-Time Sequence Modeling with Selective State Spaces
Albert Gu
Tri Dao
Mamba
522
4,974
0
01 Dec 2023
In-Context Learning Creates Task Vectors
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Roee Hendel
Mor Geva
Amir Globerson
325
231
0
24 Oct 2023
How Do Transformers Learn In-Context Beyond Simple Functions? A Case Study on Learning with Representations
International Conference on Learning Representations (ICLR), 2023
Tianyu Guo
Wei Hu
Song Mei
Huan Wang
Caiming Xiong
Silvio Savarese
Yu Bai
215
74
0
16 Oct 2023
Are Emergent Abilities in Large Language Models just In-Context Learning?
Annual Meeting of the Association for Computational Linguistics (ACL), 2023
Sheng Lu
Irina Bigoulaeva
Rachneet Sachdeva
Harish Tayyar Madabushi
Iryna Gurevych
LRM
ELM
ReLM
415
131
0
04 Sep 2023
What In-Context Learning "Learns" In-Context: Disentangling Task Recognition and Task Learning
Annual Meeting of the Association for Computational Linguistics (ACL), 2023
Jane Pan
Tianyu Gao
Howard Chen
Danqi Chen
177
158
0
16 May 2023
The Learnability of In-Context Learning
Neural Information Processing Systems (NeurIPS), 2023
Noam Wies
Yoav Levine
Amnon Shashua
280
153
0
14 Mar 2023
Resurrecting Recurrent Neural Networks for Long Sequences
International Conference on Machine Learning (ICML), 2023
Antonio Orvieto
Samuel L. Smith
Albert Gu
Anushan Fernando
Çağlar Gülçehre
Razvan Pascanu
Soham De
477
405
0
11 Mar 2023
Larger language models do in-context learning differently
Jerry W. Wei
Jason W. Wei
Yi Tay
Dustin Tran
Albert Webson
...
Xinyun Chen
Hanxiao Liu
Da Huang
Denny Zhou
Tengyu Ma
ReLM
LRM
363
428
0
07 Mar 2023
Hyena Hierarchy: Towards Larger Convolutional Language Models
International Conference on Machine Learning (ICML), 2023
Michael Poli
Stefano Massaroli
Eric Q. Nguyen
Daniel Y. Fu
Tri Dao
S. Baccus
Yoshua Bengio
Stefano Ermon
Christopher Ré
VLM
507
411
0
21 Feb 2023
Transformers learn in-context by gradient descent
International Conference on Machine Learning (ICML), 2022
J. Oswald
Eyvind Niklasson
E. Randazzo
João Sacramento
A. Mordvintsev
A. Zhmoginov
Max Vladymyrov
MLT
466
634
0
15 Dec 2022
What learning algorithm is in-context learning? Investigations with linear models
International Conference on Learning Representations (ICLR), 2022
Ekin Akyürek
Dale Schuurmans
Jacob Andreas
Tengyu Ma
Denny Zhou
474
602
0
28 Nov 2022
What Can Transformers Learn In-Context? A Case Study of Simple Function Classes
Neural Information Processing Systems (NeurIPS), 2022
Shivam Garg
Dimitris Tsipras
Abigail Z. Jacobs
Gregory Valiant
557
661
0
01 Aug 2022
Efficiently Modeling Long Sequences with Structured State Spaces
International Conference on Learning Representations (ICLR), 2021
Albert Gu
Karan Goel
Christopher Ré
886
2,764
0
31 Oct 2021
Transformers are RNNs: Fast Autoregressive Transformers with Linear Attention
Angelos Katharopoulos
Apoorv Vyas
Nikolaos Pappas
Franccois Fleuret
632
2,252
0
29 Jun 2020
Longformer: The Long-Document Transformer
Iz Beltagy
Matthew E. Peters
Arman Cohan
RALM
VLM
612
4,805
0
10 Apr 2020
Weighted Sigmoid Gate Unit for an Activation Function of Deep Neural Network
Masayuki Tanaka
153
59
0
03 Oct 2018
Attention Is All You Need
Neural Information Processing Systems (NeurIPS), 2017
Ashish Vaswani
Noam M. Shazeer
Niki Parmar
Jakob Uszkoreit
Llion Jones
Aidan Gomez
Lukasz Kaiser
Illia Polosukhin
3DV
2.6K
158,651
0
12 Jun 2017
Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks
Chelsea Finn
Pieter Abbeel
Sergey Levine
OOD
1.6K
13,353
0
09 Mar 2017
1