ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2306.00561
  4. Cited By
Masked Autoencoders with Multi-Window Local-Global Attention Are Better
  Audio Learners

Masked Autoencoders with Multi-Window Local-Global Attention Are Better Audio Learners

1 June 2023
Sarthak Yadav
Sergios Theodoridis
Lars Kai Hansen
Zheng-Hua Tan
ArXivPDFHTML

Papers citing "Masked Autoencoders with Multi-Window Local-Global Attention Are Better Audio Learners"

9 / 9 papers shown
Title
Structured-Noise Masked Modeling for Video, Audio and Beyond
Structured-Noise Masked Modeling for Video, Audio and Beyond
Aritra Bhowmik
Fida Mohammad Thoker
Carlos Hinojosa
Bernard Ghanem
Cees G. M. Snoek
VGen
59
0
0
20 Mar 2025
How not to Stitch Representations to Measure Similarity: Task Loss
  Matching versus Direct Matching
How not to Stitch Representations to Measure Similarity: Task Loss Matching versus Direct Matching
András Balogh
Márk Jelasity
85
0
0
15 Dec 2024
BiSSL: A Bilevel Optimization Framework for Enhancing the Alignment Between Self-Supervised Pre-Training and Downstream Fine-Tuning
BiSSL: A Bilevel Optimization Framework for Enhancing the Alignment Between Self-Supervised Pre-Training and Downstream Fine-Tuning
Gustav Wagner Zakarias
Lars Kai Hansen
Zheng-Hua Tan
32
0
0
03 Oct 2024
Audio xLSTMs: Learning Self-Supervised Audio Representations with xLSTMs
Audio xLSTMs: Learning Self-Supervised Audio Representations with xLSTMs
Sarthak Yadav
Sergios Theodoridis
Zheng-Hua Tan
45
2
0
29 Aug 2024
Audio Mamba: Selective State Spaces for Self-Supervised Audio
  Representations
Audio Mamba: Selective State Spaces for Self-Supervised Audio Representations
Sarthak Yadav
Zheng-Hua Tan
Mamba
42
10
0
04 Jun 2024
Masked World Models for Visual Control
Masked World Models for Visual Control
Younggyo Seo
Danijar Hafner
Hao Liu
Fangchen Liu
Stephen James
Kimin Lee
Pieter Abbeel
OffRL
87
146
0
28 Jun 2022
BYOL-S: Learning Self-supervised Speech Representations by Bootstrapping
BYOL-S: Learning Self-supervised Speech Representations by Bootstrapping
Gasser Elbanna
Neil Scheidwasser
M. Kegler
P. Beckmann
Karl El Hajal
Milos Cernak
SSL
31
21
0
24 Jun 2022
HTS-AT: A Hierarchical Token-Semantic Audio Transformer for Sound
  Classification and Detection
HTS-AT: A Hierarchical Token-Semantic Audio Transformer for Sound Classification and Detection
Ke Chen
Xingjian Du
Bilei Zhu
Zejun Ma
Taylor Berg-Kirkpatrick
Shlomo Dubnov
ViT
121
264
0
02 Feb 2022
Masked Autoencoders Are Scalable Vision Learners
Masked Autoencoders Are Scalable Vision Learners
Kaiming He
Xinlei Chen
Saining Xie
Yanghao Li
Piotr Dollár
Ross B. Girshick
ViT
TPM
305
7,443
0
11 Nov 2021
1