ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2408.16568
  4. Cited By
AxLSTMs: learning self-supervised audio representations with xLSTMs
v1v2v3v4 (latest)

AxLSTMs: learning self-supervised audio representations with xLSTMs

29 August 2024
Sarthak Yadav
Sergios Theodoridis
Zheng-Hua Tan
ArXiv (abs)PDFHTML

Papers citing "AxLSTMs: learning self-supervised audio representations with xLSTMs"

31 / 31 papers shown
Title
MambAttention: Mamba with Multi-Head Attention for Generalizable Single-Channel Speech Enhancement
MambAttention: Mamba with Multi-Head Attention for Generalizable Single-Channel Speech Enhancement
Nikolai Lund Kühne
Jesper Jensen
Jan Østergaard
Zheng-Hua Tan
Mamba
253
1
0
01 Jul 2025
Vision-LSTM: xLSTM as Generic Vision Backbone
Vision-LSTM: xLSTM as Generic Vision BackboneInternational Conference on Learning Representations (ICLR), 2024
Benedikt Alkin
M. Beck
Korbinian Poppel
Sepp Hochreiter
Johannes Brandstetter
VLM
448
80
0
24 Feb 2025
BabyHGRN: Exploring RNNs for Sample-Efficient Training of Language
  Models
BabyHGRN: Exploring RNNs for Sample-Efficient Training of Language Models
Patrick Haller
Jonas Golde
Alan Akbik
243
0
0
20 Dec 2024
Audio Mamba: Selective State Spaces for Self-Supervised Audio
  Representations
Audio Mamba: Selective State Spaces for Self-Supervised Audio Representations
Sarthak Yadav
Zheng-Hua Tan
Mamba
215
27
0
04 Jun 2024
Transformers are SSMs: Generalized Models and Efficient Algorithms
  Through Structured State Space Duality
Transformers are SSMs: Generalized Models and Efficient Algorithms Through Structured State Space Duality
Tri Dao
Albert Gu
Mamba
377
1,002
0
31 May 2024
Vision Mamba: Efficient Visual Representation Learning with
  Bidirectional State Space Model
Vision Mamba: Efficient Visual Representation Learning with Bidirectional State Space ModelInternational Conference on Machine Learning (ICML), 2024
Lianghui Zhu
Bencheng Liao
Qian Zhang
Xinlong Wang
Wenyu Liu
Xinggang Wang
Mamba
419
1,322
0
17 Jan 2024
Mamba: Linear-Time Sequence Modeling with Selective State Spaces
Mamba: Linear-Time Sequence Modeling with Selective State Spaces
Albert Gu
Tri Dao
Mamba
526
5,020
0
01 Dec 2023
Masked Autoencoders with Multi-Window Local-Global Attention Are Better
  Audio Learners
Masked Autoencoders with Multi-Window Local-Global Attention Are Better Audio LearnersInternational Conference on Learning Representations (ICLR), 2023
Sarthak Yadav
Sergios Theodoridis
Lars Kai Hansen
Zheng-Hua Tan
210
14
0
01 Jun 2023
Hungry Hungry Hippos: Towards Language Modeling with State Space Models
Hungry Hungry Hippos: Towards Language Modeling with State Space ModelsInternational Conference on Learning Representations (ICLR), 2022
Daniel Y. Fu
Tri Dao
Khaled Kamal Saab
A. Thomas
Atri Rudra
Christopher Ré
366
548
0
28 Dec 2022
BEATs: Audio Pre-Training with Acoustic Tokenizers
BEATs: Audio Pre-Training with Acoustic TokenizersInternational Conference on Machine Learning (ICML), 2022
Sanyuan Chen
Yu-Huan Wu
Chengyi Wang
Shujie Liu
Daniel C. Tompkins
Zhuo Chen
Furu Wei
338
455
0
18 Dec 2022
Masked Autoencoders that Listen
Masked Autoencoders that ListenNeural Information Processing Systems (NeurIPS), 2022
Po-Yao (Bernie) Huang
Hu Xu
Juncheng Billy Li
Alexei Baevski
Michael Auli
Wojciech Galuba
Florian Metze
Christoph Feichtenhofer
472
384
0
13 Jul 2022
Masked Spectrogram Modeling using Masked Autoencoders for Learning
  General-purpose Audio Representation
Masked Spectrogram Modeling using Masked Autoencoders for Learning General-purpose Audio Representation
Daisuke Niizumi
Daiki Takeuchi
Yasunori Ohishi
Noboru Harada
K. Kashino
180
80
0
26 Apr 2022
HEAR: Holistic Evaluation of Audio Representations
HEAR: Holistic Evaluation of Audio RepresentationsNeural Information Processing Systems (NeurIPS), 2022
Joseph P. Turian
Jordie Shier
H. Khan
Bhiksha Raj
Björn W. Schuller
...
P. Esling
Pranay Manocha
Shinji Watanabe
Zeyu Jin
Yonatan Bisk
351
133
0
06 Mar 2022
SimMIM: A Simple Framework for Masked Image Modeling
SimMIM: A Simple Framework for Masked Image Modeling
Zhenda Xie
Zheng Zhang
Yue Cao
Yutong Lin
Jianmin Bao
Zhuliang Yao
Jingdong Sun
Han Hu
397
1,622
0
18 Nov 2021
Masked Autoencoders Are Scalable Vision Learners
Masked Autoencoders Are Scalable Vision LearnersComputer Vision and Pattern Recognition (CVPR), 2021
Kaiming He
Xinlei Chen
Saining Xie
Yanghao Li
Piotr Dollár
Ross B. Girshick
ViTTPM
1.9K
9,902
0
11 Nov 2021
Efficiently Modeling Long Sequences with Structured State Spaces
Efficiently Modeling Long Sequences with Structured State SpacesInternational Conference on Learning Representations (ICLR), 2021
Albert Gu
Karan Goel
Christopher Ré
914
2,777
0
31 Oct 2021
WavLM: Large-Scale Self-Supervised Pre-Training for Full Stack Speech
  Processing
WavLM: Large-Scale Self-Supervised Pre-Training for Full Stack Speech Processing
Sanyuan Chen
Chengyi Wang
Zhengyang Chen
Yu-Huan Wu
Shujie Liu
...
Yao Qian
Jian Wu
Micheal Zeng
Xiangzhan Yu
Furu Wei
SSL
967
2,597
0
26 Oct 2021
SSAST: Self-Supervised Audio Spectrogram Transformer
SSAST: Self-Supervised Audio Spectrogram Transformer
Yuan Gong
Cheng-I Jeff Lai
Yu-An Chung
James R. Glass
ViT
291
352
0
19 Oct 2021
Efficient Training of Audio Transformers with Patchout
Efficient Training of Audio Transformers with PatchoutInterspeech (Interspeech), 2021
Khaled Koutini
Jan Schluter
Hamid Eghbalzadeh
Gerhard Widmer
ViT
486
342
0
11 Oct 2021
HuBERT: Self-Supervised Speech Representation Learning by Masked
  Prediction of Hidden Units
HuBERT: Self-Supervised Speech Representation Learning by Masked Prediction of Hidden UnitsIEEE/ACM Transactions on Audio Speech and Language Processing (TASLP), 2021
Wei-Ning Hsu
Benjamin Bolte
Yifan Hao
Kushal Lakhotia
Ruslan Salakhutdinov
Abdel-rahman Mohamed
SSL
496
3,946
0
14 Jun 2021
MLP-Mixer: An all-MLP Architecture for Vision
MLP-Mixer: An all-MLP Architecture for VisionNeural Information Processing Systems (NeurIPS), 2021
Ilya O. Tolstikhin
N. Houlsby
Alexander Kolesnikov
Lucas Beyer
Xiaohua Zhai
...
Andreas Steiner
Daniel Keysers
Jakob Uszkoreit
Mario Lucic
Alexey Dosovitskiy
1.1K
3,237
0
04 May 2021
SUPERB: Speech processing Universal PERformance Benchmark
SUPERB: Speech processing Universal PERformance BenchmarkInterspeech (Interspeech), 2021
Shu-Wen Yang
Po-Han Chi
Yung-Sung Chuang
Cheng-I Jeff Lai
Kushal Lakhotia
...
Shuyan Dong
Shang-Wen Li
Shinji Watanabe
Abdel-rahman Mohamed
Hung-yi Lee
SSL
433
1,069
0
03 May 2021
An Image is Worth 16x16 Words: Transformers for Image Recognition at
  Scale
An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale
Alexey Dosovitskiy
Lucas Beyer
Alexander Kolesnikov
Dirk Weissenborn
Xiaohua Zhai
...
Matthias Minderer
G. Heigold
Sylvain Gelly
Jakob Uszkoreit
N. Houlsby
ViT
1.3K
54,134
0
22 Oct 2020
FSD50K: An Open Dataset of Human-Labeled Sound Events
FSD50K: An Open Dataset of Human-Labeled Sound EventsIEEE/ACM Transactions on Audio Speech and Language Processing (TASLP), 2020
Eduardo Fonseca
Xavier Favory
Jordi Pons
F. Font
Xavier Serra
452
594
0
01 Oct 2020
Transformers are RNNs: Fast Autoregressive Transformers with Linear
  Attention
Transformers are RNNs: Fast Autoregressive Transformers with Linear Attention
Angelos Katharopoulos
Apoorv Vyas
Nikolaos Pappas
Franccois Fleuret
648
2,262
0
29 Jun 2020
wav2vec 2.0: A Framework for Self-Supervised Learning of Speech
  Representations
wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations
Alexei Baevski
Henry Zhou
Abdel-rahman Mohamed
Michael Auli
SSL
1.6K
7,296
0
20 Jun 2020
BERT: Pre-training of Deep Bidirectional Transformers for Language
  Understanding
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Jacob Devlin
Ming-Wei Chang
Kenton Lee
Kristina Toutanova
VLMSSLSSeg
2.9K
107,304
0
11 Oct 2018
Speech Commands: A Dataset for Limited-Vocabulary Speech Recognition
Speech Commands: A Dataset for Limited-Vocabulary Speech Recognition
Pete Warden
206
1,832
0
09 Apr 2018
Classification vs. Regression in Supervised Learning for Single Channel
  Speaker Count Estimation
Classification vs. Regression in Supervised Learning for Single Channel Speaker Count Estimation
Fabian-Robert Stöter
Soumitro Chakrabarty
B. Edler
Emanuel Habets
BDL
206
39
0
12 Dec 2017
Attention Is All You Need
Attention Is All You NeedNeural Information Processing Systems (NeurIPS), 2017
Ashish Vaswani
Noam M. Shazeer
Niki Parmar
Jakob Uszkoreit
Llion Jones
Aidan Gomez
Lukasz Kaiser
Illia Polosukhin
3DV
2.7K
159,090
0
12 Jun 2017
Neural Audio Synthesis of Musical Notes with WaveNet Autoencoders
Neural Audio Synthesis of Musical Notes with WaveNet Autoencoders
Jesse Engel
Cinjon Resnick
Adam Roberts
Sander Dieleman
Douglas Eck
Karen Simonyan
Mohammad Norouzi
235
696
0
05 Apr 2017
1