ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2410.11687
  4. Cited By
State-space models can learn in-context by gradient descent
v1v2 (latest)

State-space models can learn in-context by gradient descent

15 October 2024
Neeraj Mohan Sushma
Yudou Tian
Harshvardhan Mestha
Nicolo Colombo
David Kappel
Anand Subramoney
ArXiv (abs)PDFHTML

Papers citing "State-space models can learn in-context by gradient descent"

26 / 26 papers shown
Title
From Markov to Laplace: How Mamba In-Context Learns Markov Chains
From Markov to Laplace: How Mamba In-Context Learns Markov Chains
Marco Bondaschi
Nived Rajaraman
Xiuying Wei
Kannan Ramchandran
Razvan Pascanu
Çağlar Gülçehre
Michael C. Gastpar
Ashok Vardhan Makkuva
304
4
0
17 Feb 2025
Longhorn: State Space Models are Amortized Online Learners
Longhorn: State Space Models are Amortized Online Learners
Bo Liu
Rui Wang
Lemeng Wu
Yihao Feng
Peter Stone
Qian Liu
347
28
0
19 Jul 2024
Learning to (Learn at Test Time): RNNs with Expressive Hidden States
Learning to (Learn at Test Time): RNNs with Expressive Hidden States
Yu Sun
Xinhao Li
Karan Dalal
Jiarui Xu
Arjun Vikram
...
Xinlei Chen
Xiaolong Wang
Sanmi Koyejo
Tatsunori Hashimoto
Carlos Guestrin
573
176
0
05 Jul 2024
Samba: Simple Hybrid State Space Models for Efficient Unlimited Context Language Modeling
Samba: Simple Hybrid State Space Models for Efficient Unlimited Context Language Modeling
Liliang Ren
Yang Liu
Yadong Lu
Haoran Pan
Chen Liang
Weizhu Chen
Mamba
315
109
0
11 Jun 2024
Transformers are SSMs: Generalized Models and Efficient Algorithms
  Through Structured State Space Duality
Transformers are SSMs: Generalized Models and Efficient Algorithms Through Structured State Space Duality
Tri Dao
Albert Gu
Mamba
377
998
0
31 May 2024
Griffin: Mixing Gated Linear Recurrences with Local Attention for
  Efficient Language Models
Griffin: Mixing Gated Linear Recurrences with Local Attention for Efficient Language Models
Soham De
Samuel L. Smith
Anushan Fernando
Aleksandar Botev
George-Christian Muraru
...
David Budden
Yee Whye Teh
Razvan Pascanu
Nando de Freitas
Çağlar Gülçehre
Mamba
239
185
0
29 Feb 2024
The dynamic interplay between in-context and in-weight learning in humans and neural networks
The dynamic interplay between in-context and in-weight learning in humans and neural networks
Jacob Russin
Ellie Pavlick
Michael J. Frank
229
4
0
13 Feb 2024
Is Mamba Capable of In-Context Learning?
Is Mamba Capable of In-Context Learning?
Riccardo Grazzi
Julien N. Siems
Simon Schrodi
Thomas Brox
Frank Hutter
187
55
0
05 Feb 2024
Mamba: Linear-Time Sequence Modeling with Selective State Spaces
Mamba: Linear-Time Sequence Modeling with Selective State Spaces
Albert Gu
Tri Dao
Mamba
522
4,974
0
01 Dec 2023
In-Context Learning Creates Task Vectors
In-Context Learning Creates Task VectorsConference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Roee Hendel
Mor Geva
Amir Globerson
325
231
0
24 Oct 2023
How Do Transformers Learn In-Context Beyond Simple Functions? A Case
  Study on Learning with Representations
How Do Transformers Learn In-Context Beyond Simple Functions? A Case Study on Learning with RepresentationsInternational Conference on Learning Representations (ICLR), 2023
Tianyu Guo
Wei Hu
Song Mei
Huan Wang
Caiming Xiong
Silvio Savarese
Yu Bai
215
74
0
16 Oct 2023
Are Emergent Abilities in Large Language Models just In-Context
  Learning?
Are Emergent Abilities in Large Language Models just In-Context Learning?Annual Meeting of the Association for Computational Linguistics (ACL), 2023
Sheng Lu
Irina Bigoulaeva
Rachneet Sachdeva
Harish Tayyar Madabushi
Iryna Gurevych
LRMELMReLM
415
131
0
04 Sep 2023
What In-Context Learning "Learns" In-Context: Disentangling Task
  Recognition and Task Learning
What In-Context Learning "Learns" In-Context: Disentangling Task Recognition and Task LearningAnnual Meeting of the Association for Computational Linguistics (ACL), 2023
Jane Pan
Tianyu Gao
Howard Chen
Danqi Chen
177
158
0
16 May 2023
The Learnability of In-Context Learning
The Learnability of In-Context LearningNeural Information Processing Systems (NeurIPS), 2023
Noam Wies
Yoav Levine
Amnon Shashua
280
153
0
14 Mar 2023
Resurrecting Recurrent Neural Networks for Long Sequences
Resurrecting Recurrent Neural Networks for Long SequencesInternational Conference on Machine Learning (ICML), 2023
Antonio Orvieto
Samuel L. Smith
Albert Gu
Anushan Fernando
Çağlar Gülçehre
Razvan Pascanu
Soham De
477
405
0
11 Mar 2023
Larger language models do in-context learning differently
Larger language models do in-context learning differently
Jerry W. Wei
Jason W. Wei
Yi Tay
Dustin Tran
Albert Webson
...
Xinyun Chen
Hanxiao Liu
Da Huang
Denny Zhou
Tengyu Ma
ReLMLRM
363
428
0
07 Mar 2023
Hyena Hierarchy: Towards Larger Convolutional Language Models
Hyena Hierarchy: Towards Larger Convolutional Language ModelsInternational Conference on Machine Learning (ICML), 2023
Michael Poli
Stefano Massaroli
Eric Q. Nguyen
Daniel Y. Fu
Tri Dao
S. Baccus
Yoshua Bengio
Stefano Ermon
Christopher Ré
VLM
507
411
0
21 Feb 2023
Transformers learn in-context by gradient descent
Transformers learn in-context by gradient descentInternational Conference on Machine Learning (ICML), 2022
J. Oswald
Eyvind Niklasson
E. Randazzo
João Sacramento
A. Mordvintsev
A. Zhmoginov
Max Vladymyrov
MLT
466
634
0
15 Dec 2022
What learning algorithm is in-context learning? Investigations with
  linear models
What learning algorithm is in-context learning? Investigations with linear modelsInternational Conference on Learning Representations (ICLR), 2022
Ekin Akyürek
Dale Schuurmans
Jacob Andreas
Tengyu Ma
Denny Zhou
474
602
0
28 Nov 2022
What Can Transformers Learn In-Context? A Case Study of Simple Function
  Classes
What Can Transformers Learn In-Context? A Case Study of Simple Function ClassesNeural Information Processing Systems (NeurIPS), 2022
Shivam Garg
Dimitris Tsipras
Abigail Z. Jacobs
Gregory Valiant
557
661
0
01 Aug 2022
Efficiently Modeling Long Sequences with Structured State Spaces
Efficiently Modeling Long Sequences with Structured State SpacesInternational Conference on Learning Representations (ICLR), 2021
Albert Gu
Karan Goel
Christopher Ré
886
2,764
0
31 Oct 2021
Transformers are RNNs: Fast Autoregressive Transformers with Linear
  Attention
Transformers are RNNs: Fast Autoregressive Transformers with Linear Attention
Angelos Katharopoulos
Apoorv Vyas
Nikolaos Pappas
Franccois Fleuret
632
2,252
0
29 Jun 2020
Longformer: The Long-Document Transformer
Longformer: The Long-Document Transformer
Iz Beltagy
Matthew E. Peters
Arman Cohan
RALMVLM
612
4,805
0
10 Apr 2020
Weighted Sigmoid Gate Unit for an Activation Function of Deep Neural
  Network
Weighted Sigmoid Gate Unit for an Activation Function of Deep Neural Network
Masayuki Tanaka
153
59
0
03 Oct 2018
Attention Is All You Need
Attention Is All You NeedNeural Information Processing Systems (NeurIPS), 2017
Ashish Vaswani
Noam M. Shazeer
Niki Parmar
Jakob Uszkoreit
Llion Jones
Aidan Gomez
Lukasz Kaiser
Illia Polosukhin
3DV
2.6K
158,651
0
12 Jun 2017
Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks
Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks
Chelsea Finn
Pieter Abbeel
Sergey Levine
OOD
1.6K
13,353
0
09 Mar 2017
1