Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2306.04186
Cited By
Self-supervised Audio Teacher-Student Transformer for Both Clip-level and Frame-level Tasks
7 June 2023
Xian Li
Nian Shao
Xiaofei Li
ViT
CLIP
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Self-supervised Audio Teacher-Student Transformer for Both Clip-level and Frame-level Tasks"
21 / 21 papers shown
Title
TACOS: Temporally-aligned Audio CaptiOnS for Language-Audio Pretraining
Paul Primus
Florian Schmid
Gerhard Widmer
CLIP
AI4TS
VLM
26
0
0
12 May 2025
Token Pruning in Audio Transformers: Optimizing Performance and Decoding Patch Importance
Taehan Lee
Hyukjun Lee
ViT
VLM
37
0
0
02 Apr 2025
Exploring Performance-Complexity Trade-Offs in Sound Event Detection
T. Morocutti
Florian Schmid
Jonathan Greif
Francesco Foscarin
Gerhard Widmer
36
0
0
14 Mar 2025
Masked Latent Prediction and Classification for Self-Supervised Audio Representation Learning
Aurian Quélennec
Pierre Chouteau
Geoffroy Peeters
S. Essid
SSL
52
0
0
17 Feb 2025
LS-EEND: Long-Form Streaming End-to-End Neural Diarization with Online Attractor Extraction
Di Liang
Xiaofei Li
19
0
0
09 Oct 2024
Effective Pre-Training of Audio Transformers for Sound Event Detection
Florian Schmid
T. Morocutti
Francesco Foscarin
Jan Schluter
Paul Primus
Gerhard Widmer
ViT
23
2
0
14 Sep 2024
Towards reliable respiratory disease diagnosis based on cough sounds and vision transformers
Qian Wang
Zhaoyang Bu
Jiaxuan Mao
Wenyu Zhu
Jingya Zhao
Wei Du
Guochao Shi
Min Zhou
Si Chen
Jieming Qu
MedIm
25
0
0
28 Aug 2024
Estimated Audio-Caption Correspondences Improve Language-Based Audio Retrieval
Paul Primus
Florian Schmid
Gerhard Widmer
29
2
0
21 Aug 2024
Improving Audio Spectrogram Transformers for Sound Event Detection Through Multi-Stage Training
Florian Schmid
Paul Primus
T. Morocutti
Jonathan Greif
Gerhard Widmer
24
5
0
17 Jul 2024
AnoPatch: Towards Better Consistency in Machine Anomalous Sound Detection
Anbai Jiang
Bing Han
Zhiqiang Lv
Yufeng Deng
Wei-Qiang Zhang
Xie Chen
Yanmin Qian
Jia Liu
Pingyi Fan
27
3
0
17 Jun 2024
Scaling up masked audio encoder learning for general audio classification
Heinrich Dinkel
Zhiyong Yan
Yongqing Wang
Junbo Zhang
Yujun Wang
Bin Wang
22
2
0
11 Jun 2024
M2D-CLAP: Masked Modeling Duo Meets CLAP for Learning General-purpose Audio-Language Representation
Daisuke Niizumi
Daiki Takeuchi
Yasunori Ohishi
Noboru Harada
Masahiro Yasuda
Shunsuke Tsubaki
Keisuke Imoto
VLM
31
5
0
04 Jun 2024
Full-frequency dynamic convolution: a physical frequency-dependent convolution for sound event detection
Haobo Yue
Zhicheng Zhang
Da Mu
Yonghao Dang
Jianqin Yin
Jin Tang
15
0
0
10 Jan 2024
EAT: Self-Supervised Pre-Training with Efficient Audio Transformer
Wenxi Chen
Yuzhe Liang
Ziyang Ma
Zhisheng Zheng
Xie Chen
ViT
35
17
0
07 Jan 2024
Self-Supervised Learning of Spatial Acoustic Representation with Cross-Channel Signal Reconstruction and Multi-Channel Conformer
Bing Yang
Xiaofei Li
SSL
17
3
0
01 Dec 2023
Fine-tune the pretrained ATST model for sound event detection
Nian Shao
Xian Li
Xiaofei Li
16
26
0
15 Sep 2023
HTS-AT: A Hierarchical Token-Semantic Audio Transformer for Sound Classification and Detection
Ke Chen
Xingjian Du
Bilei Zhu
Zejun Ma
Taylor Berg-Kirkpatrick
Shlomo Dubnov
ViT
114
264
0
02 Feb 2022
Masked Autoencoders Are Scalable Vision Learners
Kaiming He
Xinlei Chen
Saining Xie
Yanghao Li
Piotr Dollár
Ross B. Girshick
ViT
TPM
258
7,412
0
11 Nov 2021
Emerging Properties in Self-Supervised Vision Transformers
Mathilde Caron
Hugo Touvron
Ishan Misra
Hervé Jégou
Julien Mairal
Piotr Bojanowski
Armand Joulin
292
5,761
0
29 Apr 2021
PSLA: Improving Audio Tagging with Pretraining, Sampling, Labeling, and Aggregation
Yuan Gong
Yu-An Chung
James R. Glass
VLM
99
144
0
02 Feb 2021
A Framework for the Robust Evaluation of Sound Event Detection
Cagdas Bilen
Giacomo Ferroni
Francesco Tuveri
Juan Azcarreta
Sacha Krstulović
32
162
0
18 Oct 2019
1