Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1912.10211
Cited By
v1
v2
v3
v4
v5 (latest)
PANNs: Large-Scale Pretrained Audio Neural Networks for Audio Pattern Recognition
21 December 2019
Qiuqiang Kong
Yin Cao
Turab Iqbal
Yuxuan Wang
Wenwu Wang
Mark D. Plumbley
VLM
SSL
Re-assign community
ArXiv (abs)
PDF
HTML
Github (1475★)
Papers citing
"PANNs: Large-Scale Pretrained Audio Neural Networks for Audio Pattern Recognition"
50 / 545 papers shown
Title
CoMoSpeech: One-Step Speech and Singing Voice Synthesis via Consistency Model
Zhe Ye
Wei Xue
Xuejiao Tan
Jie Chen
Qi-fei Liu
Yi-Ting Guo
DiffM
95
46
0
11 May 2023
Extending Audio Masked Autoencoders Toward Audio Restoration
Zhi-Wei Zhong
Hao Shi
M. Hirano
Kazuki Shimada
Kazuya Tateishi
Takashi Shibuya
Shusuke Takahashi
Yuki Mitsufuji
67
6
0
11 May 2023
Joint Moment Retrieval and Highlight Detection Via Natural Language Queries
Richard Luo
Austin Peng
Heidi Yap
Koby Beard
ViT
55
0
0
08 May 2023
Compressing audio CNNs with graph centrality based filter pruning
James A. King
Ashutosh Kumar Singh
Mark D. Plumbley
GNN
19
2
0
05 May 2023
Learning to Detect Novel and Fine-Grained Acoustic Sequences Using Pretrained Audio Representations
Vasudha Kowtha
Miquel Espi Marques
Jonathan Huang
Yichi Zhang
C. Avendaño
AI4TS
61
0
0
03 May 2023
Unsupervised Improvement of Audio-Text Cross-Modal Representations
Zhepei Wang
Cem Subakan
Krishna Subramani
Junkai Wu
Tiago Tavares
Fabio Ayres
Paris Smaragdis
SSL
81
2
0
03 May 2023
Self-supervised learning for infant cry analysis
Arsenii Gorin
Cem Subakan
Sajjad Abdoli
Junhao Wang
Samantha Latremouille
Charles C. Onu
60
10
0
02 May 2023
Multitask learning in Audio Captioning: a sentence embedding regression loss acts as a regularizer
Etienne Labbé
J. Pinquier
Thomas Pellegrini
92
5
0
02 May 2023
Adversarial Representation Learning for Robust Privacy Preservation in Audio
Shayan Gharib
Minh Tran
Diep Luong
Konstantinos Drossos
Tuomas Virtanen
AAML
43
5
0
29 Apr 2023
A Comparative Study of Pre-trained Speech and Audio Embeddings for Speech Emotion Recognition
Orchid Chetia Phukan
Arun Balaji Buduru
Rajesh Sharma
60
7
0
22 Apr 2023
Robust Cross-Modal Knowledge Distillation for Unconstrained Videos
Wenke Xia
Xingjian Li
Andong Deng
Haoyi Xiong
Dejing Dou
Di Hu
67
5
0
16 Apr 2023
Looking Similar, Sounding Different: Leveraging Counterfactual Cross-Modal Pairs for Audiovisual Representation Learning
Nikhil Singh
Chih-Wei Wu
Iroro Orife
Mahdi M. Kalayeh
110
2
0
12 Apr 2023
Graph Attention for Automated Audio Captioning
Feiyang Xiao
Jian Guan
Qiaoxi Zhu
Wenwu Wang
64
8
0
07 Apr 2023
Efficient CNNs via Passive Filter Pruning
Arshdeep Singh
Mark D. Plumbley
46
1
0
05 Apr 2023
AUDIT: Audio Editing by Following Instructions with Latent Diffusion Models
Yuancheng Wang
Zeqian Ju
Xuejiao Tan
Lei He
Zhizheng Wu
Jiang Bian
Sheng Zhao
DiffM
150
55
0
03 Apr 2023
Prefix tuning for automated audio captioning
Minkyu Kim
Kim Sung-Bin
Tae-Hyun Oh
100
45
0
30 Mar 2023
WavCaps: A ChatGPT-Assisted Weakly-Labelled Audio Captioning Dataset for Audio-Language Multimodal Research
Xinhao Mei
Chutong Meng
Haohe Liu
Qiuqiang Kong
Tom Ko
Chengqi Zhao
Mark D. Plumbley
Yuexian Zou
Wenwu Wang
178
220
0
30 Mar 2023
Collaborative Noisy Label Cleaner: Learning Scene-aware Trailers for Multi-modal Highlight Detection in Movies
Bei Gan
Xiujun Shu
Ruizhi Qiao
Haoqian Wu
Keyun Chen
Hanjun Li
Bohan Ren
53
5
0
26 Mar 2023
Query-Dependent Video Representation for Moment Retrieval and Highlight Detection
WonJun Moon
Sangeek Hyun
S. Park
Dongchan Park
Jae-Pil Heo
ViT
105
115
0
24 Mar 2023
Multiscale Audio Spectrogram Transformer for Efficient Audio Classification
Wenjie Zhu
M. Omar
85
22
0
19 Mar 2023
Audio-Text Models Do Not Yet Leverage Natural Language
Ho-Hsiang Wu
Oriol Nieto
J. P. Bello
Justin Salamon
VLM
74
33
0
19 Mar 2023
Weight-sharing Supernet for Searching Specialized Acoustic Event Classification Networks Across Device Constraints
Guan-Ting Lin
Qingming Tang
Chieh-Chi Kao
Viktor Rozgic
Chao Wang
87
1
0
18 Mar 2023
Enhancing Unsupervised Audio Representation Learning via Adversarial Sample Generation
Yulin Pan
Xiangteng He
Biao Gong
Yuxin Peng
Yiliang Lv
SSL
46
0
0
15 Mar 2023
Target Sound Extraction with Variable Cross-modality Clues
Chenda Li
Yao Qian
Zhuo Chen
Dongmei Wang
Takuya Yoshioka
Shujie Liu
Y. Qian
Michael Zeng
VLM
68
14
0
15 Mar 2023
BLAT: Bootstrapping Language-Audio Pre-training based on AudioSet Tag-guided Synthetic Data
Xuenan Xu
Zhiling Zhang
Zelin Zhou
Pingyue Zhang
Zeyu Xie
Mengyue Wu
Ke Zhu
CLIP
164
15
0
14 Mar 2023
CAT: Causal Audio Transformer for Audio Classification
Xiaoyu Liu
Hanlin Lu
Jianbo Yuan
Xinyu Li
ViT
83
24
0
14 Mar 2023
Improving Text-Audio Retrieval by Text-aware Attention Pooling and Prior Matrix Revised Loss
Yifei Xin
Dongchao Yang
Yuexian Zou
109
31
0
10 Mar 2023
Audio-Visual Deception Detection: DOLOS Dataset and Parameter-Efficient Crossmodal Learning
Xiaobao Guo
Nithish Muthuchamy Selvaraj
Zitong Yu
A. Kong
Bingquan Shen
Alex C. Kot
40
10
0
09 Mar 2023
Onsets and Velocities: Affordable Real-Time Piano Transcription Using Convolutional Neural Networks
Andres Fernandez
40
3
0
08 Mar 2023
Leveraging Pre-trained AudioLDM for Text to Sound Generation: A Benchmark Study
Yiitan Yuan
Haohe Liu
Jinhua Liang
Xubo Liu
Mark D. Plumbley
Wenwu Wang
52
10
0
07 Mar 2023
AST-SED: An Effective Sound Event Detection Method Based on Audio Spectrogram Transformer
Kang Li
Yan Song
Lirong Dai
Ian Mcloughlin
Xin Fang
Lin Liu
78
22
0
07 Mar 2023
Heterogeneous Graph Learning for Acoustic Event Classification
A. Shirian
Mona Ahmadian
Krishna Somandepalli
T. Guha
71
2
0
05 Mar 2023
Low-Complexity Audio Embedding Extractors
Florian Schmid
Khaled Koutini
Gerhard Widmer
46
4
0
03 Mar 2023
Unified Keyword Spotting and Audio Tagging on Mobile Devices with Transformers
Heinrich Dinkel
Yongqing Wang
Zhiyong Yan
Junbo Zhang
Yujun Wang
64
4
0
03 Mar 2023
Incremental Learning of Acoustic Scenes and Sound Events
Manjunath Mulimani
A. Mesaros
CLL
57
7
0
28 Feb 2023
Data leakage in cross-modal retrieval training: A case study
Benno Weck
Xavier Serra
61
7
0
23 Feb 2023
Improving Speech Enhancement via Event-based Query
Yifei Xin
Xiulian Peng
Yan Lu
61
6
0
20 Feb 2023
An Attention-based Approach to Hierarchical Multi-label Music Instrument Classification
Zhi-Wei Zhong
M. Hirano
Kazuki Shimada
Kazuya Tateishi
Shusuke Takahashi
Yuki Mitsufuji
68
12
0
16 Feb 2023
Personalized Audio Quality Preference Prediction
Chung-Che Wang
Yu-Chun Lin
Yu-Teng Hsu
J. Jang
56
1
0
16 Feb 2023
Unsupervised classification to improve the quality of a bird song recording dataset
Félix Michaud
J. Sueur
Maxime LE Cesne
S. Haupert
55
32
0
15 Feb 2023
A dataset for Audio-Visual Sound Event Detection in Movies
Rajat Hebbar
Digbalay Bose
Krishna Somandepalli
Veena Vijai
Shrikanth Narayanan
47
9
0
14 Feb 2023
AudioLDM: Text-to-Audio Generation with Latent Diffusion Models
Haohe Liu
Zehua Chen
Yiitan Yuan
Xinhao Mei
Xubo Liu
Danilo Mandic
Wenwu Wang
Mark D. Plumbley
DiffM
177
509
0
29 Jan 2023
Moûsai: Text-to-Music Generation with Long-Context Latent Diffusion
Flavio Schneider
Ojasv Kamal
Zhijing Jin
Bernhard Schölkopf
MGen
115
84
0
27 Jan 2023
LoCoNet: Long-Short Context Network for Active Speaker Detection
Xizi Wang
Feng Cheng
Gedas Bertasius
David J. Crandall
86
17
0
19 Jan 2023
Training one model to detect heart and lung sound events from single point auscultations
Leander Melms
Robert R. Ilesan
Ulrich Köhler
O. Hildebrandt
R. Conradt
...
Jürgen R. Schaefer
Tobias Müller
J. Obergassel
Nadine Schlicker
M. Hirsch
83
2
0
15 Jan 2023
BEATs: Audio Pre-Training with Acoustic Tokenizers
Sanyuan Chen
Yu-Huan Wu
Chengyi Wang
Shujie Liu
Daniel C. Tompkins
Zhuo Chen
Furu Wei
124
299
0
18 Dec 2022
Learning from Taxonomy: Multi-label Few-Shot Classification for Everyday Sound Recognition
Jinhua Liang
Huy P Phan
Emmanouil Benetos
119
12
0
17 Dec 2022
MAViL: Masked Audio-Video Learners
Po-Yao (Bernie) Huang
Vasu Sharma
Hu Xu
Chaitanya K. Ryali
Haoqi Fan
Yanghao Li
Shang-Wen Li
Gargi Ghosh
Jitendra Malik
Christoph Feichtenhofer
79
54
0
15 Dec 2022
Improving trajectory localization accuracy via direction-of-arrival derivative estimation
Ruchi Pandey
Shreya Jaiswal
Huy P Phan
S. Nannuru
70
0
0
07 Dec 2022
Towards Generating Diverse Audio Captions via Adversarial Training
Xinhao Mei
Xubo Liu
Jianyuan Sun
Mark D. Plumbley
Wenwu Wang
DiffM
79
2
0
05 Dec 2022
Previous
1
2
3
...
10
11
6
7
8
9
Next