Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2408.16568
Cited By
v1
v2
v3
v4 (latest)
AxLSTMs: learning self-supervised audio representations with xLSTMs
29 August 2024
Sarthak Yadav
Sergios Theodoridis
Zheng-Hua Tan
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"AxLSTMs: learning self-supervised audio representations with xLSTMs"
31 / 31 papers shown
Title
MambAttention: Mamba with Multi-Head Attention for Generalizable Single-Channel Speech Enhancement
Nikolai Lund Kühne
Jesper Jensen
Jan Østergaard
Zheng-Hua Tan
Mamba
253
1
0
01 Jul 2025
Vision-LSTM: xLSTM as Generic Vision Backbone
International Conference on Learning Representations (ICLR), 2024
Benedikt Alkin
M. Beck
Korbinian Poppel
Sepp Hochreiter
Johannes Brandstetter
VLM
448
80
0
24 Feb 2025
BabyHGRN: Exploring RNNs for Sample-Efficient Training of Language Models
Patrick Haller
Jonas Golde
Alan Akbik
243
0
0
20 Dec 2024
Audio Mamba: Selective State Spaces for Self-Supervised Audio Representations
Sarthak Yadav
Zheng-Hua Tan
Mamba
215
27
0
04 Jun 2024
Transformers are SSMs: Generalized Models and Efficient Algorithms Through Structured State Space Duality
Tri Dao
Albert Gu
Mamba
377
1,002
0
31 May 2024
Vision Mamba: Efficient Visual Representation Learning with Bidirectional State Space Model
International Conference on Machine Learning (ICML), 2024
Lianghui Zhu
Bencheng Liao
Qian Zhang
Xinlong Wang
Wenyu Liu
Xinggang Wang
Mamba
419
1,322
0
17 Jan 2024
Mamba: Linear-Time Sequence Modeling with Selective State Spaces
Albert Gu
Tri Dao
Mamba
526
5,020
0
01 Dec 2023
Masked Autoencoders with Multi-Window Local-Global Attention Are Better Audio Learners
International Conference on Learning Representations (ICLR), 2023
Sarthak Yadav
Sergios Theodoridis
Lars Kai Hansen
Zheng-Hua Tan
210
14
0
01 Jun 2023
Hungry Hungry Hippos: Towards Language Modeling with State Space Models
International Conference on Learning Representations (ICLR), 2022
Daniel Y. Fu
Tri Dao
Khaled Kamal Saab
A. Thomas
Atri Rudra
Christopher Ré
366
548
0
28 Dec 2022
BEATs: Audio Pre-Training with Acoustic Tokenizers
International Conference on Machine Learning (ICML), 2022
Sanyuan Chen
Yu-Huan Wu
Chengyi Wang
Shujie Liu
Daniel C. Tompkins
Zhuo Chen
Furu Wei
338
455
0
18 Dec 2022
Masked Autoencoders that Listen
Neural Information Processing Systems (NeurIPS), 2022
Po-Yao (Bernie) Huang
Hu Xu
Juncheng Billy Li
Alexei Baevski
Michael Auli
Wojciech Galuba
Florian Metze
Christoph Feichtenhofer
472
384
0
13 Jul 2022
Masked Spectrogram Modeling using Masked Autoencoders for Learning General-purpose Audio Representation
Daisuke Niizumi
Daiki Takeuchi
Yasunori Ohishi
Noboru Harada
K. Kashino
180
80
0
26 Apr 2022
HEAR: Holistic Evaluation of Audio Representations
Neural Information Processing Systems (NeurIPS), 2022
Joseph P. Turian
Jordie Shier
H. Khan
Bhiksha Raj
Björn W. Schuller
...
P. Esling
Pranay Manocha
Shinji Watanabe
Zeyu Jin
Yonatan Bisk
351
133
0
06 Mar 2022
SimMIM: A Simple Framework for Masked Image Modeling
Zhenda Xie
Zheng Zhang
Yue Cao
Yutong Lin
Jianmin Bao
Zhuliang Yao
Jingdong Sun
Han Hu
397
1,622
0
18 Nov 2021
Masked Autoencoders Are Scalable Vision Learners
Computer Vision and Pattern Recognition (CVPR), 2021
Kaiming He
Xinlei Chen
Saining Xie
Yanghao Li
Piotr Dollár
Ross B. Girshick
ViT
TPM
1.9K
9,902
0
11 Nov 2021
Efficiently Modeling Long Sequences with Structured State Spaces
International Conference on Learning Representations (ICLR), 2021
Albert Gu
Karan Goel
Christopher Ré
914
2,777
0
31 Oct 2021
WavLM: Large-Scale Self-Supervised Pre-Training for Full Stack Speech Processing
Sanyuan Chen
Chengyi Wang
Zhengyang Chen
Yu-Huan Wu
Shujie Liu
...
Yao Qian
Jian Wu
Micheal Zeng
Xiangzhan Yu
Furu Wei
SSL
967
2,597
0
26 Oct 2021
SSAST: Self-Supervised Audio Spectrogram Transformer
Yuan Gong
Cheng-I Jeff Lai
Yu-An Chung
James R. Glass
ViT
291
352
0
19 Oct 2021
Efficient Training of Audio Transformers with Patchout
Interspeech (Interspeech), 2021
Khaled Koutini
Jan Schluter
Hamid Eghbalzadeh
Gerhard Widmer
ViT
486
342
0
11 Oct 2021
HuBERT: Self-Supervised Speech Representation Learning by Masked Prediction of Hidden Units
IEEE/ACM Transactions on Audio Speech and Language Processing (TASLP), 2021
Wei-Ning Hsu
Benjamin Bolte
Yifan Hao
Kushal Lakhotia
Ruslan Salakhutdinov
Abdel-rahman Mohamed
SSL
496
3,946
0
14 Jun 2021
MLP-Mixer: An all-MLP Architecture for Vision
Neural Information Processing Systems (NeurIPS), 2021
Ilya O. Tolstikhin
N. Houlsby
Alexander Kolesnikov
Lucas Beyer
Xiaohua Zhai
...
Andreas Steiner
Daniel Keysers
Jakob Uszkoreit
Mario Lucic
Alexey Dosovitskiy
1.1K
3,237
0
04 May 2021
SUPERB: Speech processing Universal PERformance Benchmark
Interspeech (Interspeech), 2021
Shu-Wen Yang
Po-Han Chi
Yung-Sung Chuang
Cheng-I Jeff Lai
Kushal Lakhotia
...
Shuyan Dong
Shang-Wen Li
Shinji Watanabe
Abdel-rahman Mohamed
Hung-yi Lee
SSL
433
1,069
0
03 May 2021
An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale
Alexey Dosovitskiy
Lucas Beyer
Alexander Kolesnikov
Dirk Weissenborn
Xiaohua Zhai
...
Matthias Minderer
G. Heigold
Sylvain Gelly
Jakob Uszkoreit
N. Houlsby
ViT
1.3K
54,134
0
22 Oct 2020
FSD50K: An Open Dataset of Human-Labeled Sound Events
IEEE/ACM Transactions on Audio Speech and Language Processing (TASLP), 2020
Eduardo Fonseca
Xavier Favory
Jordi Pons
F. Font
Xavier Serra
452
594
0
01 Oct 2020
Transformers are RNNs: Fast Autoregressive Transformers with Linear Attention
Angelos Katharopoulos
Apoorv Vyas
Nikolaos Pappas
Franccois Fleuret
648
2,262
0
29 Jun 2020
wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations
Alexei Baevski
Henry Zhou
Abdel-rahman Mohamed
Michael Auli
SSL
1.6K
7,296
0
20 Jun 2020
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Jacob Devlin
Ming-Wei Chang
Kenton Lee
Kristina Toutanova
VLM
SSL
SSeg
2.9K
107,304
0
11 Oct 2018
Speech Commands: A Dataset for Limited-Vocabulary Speech Recognition
Pete Warden
206
1,832
0
09 Apr 2018
Classification vs. Regression in Supervised Learning for Single Channel Speaker Count Estimation
Fabian-Robert Stöter
Soumitro Chakrabarty
B. Edler
Emanuel Habets
BDL
206
39
0
12 Dec 2017
Attention Is All You Need
Neural Information Processing Systems (NeurIPS), 2017
Ashish Vaswani
Noam M. Shazeer
Niki Parmar
Jakob Uszkoreit
Llion Jones
Aidan Gomez
Lukasz Kaiser
Illia Polosukhin
3DV
2.7K
159,090
0
12 Jun 2017
Neural Audio Synthesis of Musical Notes with WaveNet Autoencoders
Jesse Engel
Cinjon Resnick
Adam Roberts
Sander Dieleman
Douglas Eck
Karen Simonyan
Mohammad Norouzi
235
696
0
05 Apr 2017
1