Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1912.10211
Cited By
v1
v2
v3
v4
v5 (latest)
PANNs: Large-Scale Pretrained Audio Neural Networks for Audio Pattern Recognition
21 December 2019
Qiuqiang Kong
Yin Cao
Turab Iqbal
Yuxuan Wang
Wenwu Wang
Mark D. Plumbley
VLM
SSL
Re-assign community
ArXiv (abs)
PDF
HTML
Github (1475★)
Papers citing
"PANNs: Large-Scale Pretrained Audio Neural Networks for Audio Pattern Recognition"
50 / 545 papers shown
Title
On the choice of the optimal temporal support for audio classification with Pre-trained embeddings
Aurian Quélennec
Michel Olvera
Geoffroy Peeters
S. Essid
69
2
0
21 Dec 2023
Evaluation of Barlow Twins and VICReg self-supervised learning for sound patterns of bird and anuran species
Fábio Felix Dias
M. Ponti
Mílton Cezar Ribeiro
R. Minghim
SSL
33
0
0
18 Dec 2023
Multi-level graph learning for audio event classification and human-perceived annoyance rating prediction
Yuanbo Hou
Qiaoqiao Ren
Siyang Song
Yuxin Song
Wenwu Wang
Dick Botteldooren
57
1
0
15 Dec 2023
Amphion: An Open-Source Audio, Music and Speech Generation Toolkit
Xueyao Zhang
Liumeng Xue
Yicheng Gu
Yuancheng Wang
Haorui He
...
Mingxuan Wang
Jun Han
Kai Chen
Haizhou Li
Zhizheng Wu
91
35
0
15 Dec 2023
Speaker-Text Retrieval via Contrastive Learning
Xuechen Liu
Xin Wang
Erica Cooper
Xiaoxiao Miao
Junichi Yamagishi
VLM
45
1
0
11 Dec 2023
Building Ears for Robots: Machine Hearing in the Age of Autonomy
Xuan Zhong
17
0
0
04 Dec 2023
Optimizing Context-Enhanced Relational Joins
Viktor Sanca
Manos Chatzakis
Anastasia Ailamaki
103
2
0
03 Dec 2023
Audio Prompt Tuning for Universal Sound Separation
Yuzhuo Liu
Xubo Liu
Yan Zhao
Yuanyuan Wang
Rui Xia
Pingchuan Tain
Yuxuan Wang
VLM
79
6
0
30 Nov 2023
BAM-DETR: Boundary-Aligned Moment Detection Transformer for Temporal Sentence Grounding in Videos
Pilhyeon Lee
Hyeran Byun
67
11
0
30 Nov 2023
A-JEPA: Joint-Embedding Predictive Architecture Can Listen
Zhengcong Fei
Mingyuan Fan
Junshi Huang
153
20
0
27 Nov 2023
UniRepLKNet: A Universal Perception Large-Kernel ConvNet for Audio, Video, Point Cloud, Time-Series and Image Recognition
Xiaohan Ding
Yiyuan Zhang
Yixiao Ge
Sijie Zhao
Lin Song
Xiangyu Yue
Ying Shan
VLM
AI4TS
SSL
104
129
0
27 Nov 2023
Multi-View Spectrogram Transformer for Respiratory Sound Classification
Wentao He
Yuchen Yan
Jianfeng Ren
Ruibin Bai
Xudong Jiang
MedIm
ViT
48
10
0
16 Nov 2023
AQUATK: An Audio Quality Assessment Toolkit
Ashvala Vinay
Alexander Lerch
32
2
0
16 Nov 2023
AI-based soundscape analysis: Jointly identifying sound sources and predicting annoyance
Yuanbo Hou
Qiaoqiao Ren
Huizhong Zhang
A. Mitchell
F. Aletta
Jian Kang
Dick Botteldooren
60
17
0
15 Nov 2023
EDMSound: Spectrogram Based Diffusion Models for Efficient and High-Quality Audio Synthesis
Ge Zhu
Yutong Wen
M. Carbonneau
Zhiyao Duan
DiffM
74
8
0
15 Nov 2023
TACNET: Temporal Audio Source Counting Network
Amirreza Ahmadnejad
Ahmad Mahmmodian Darviishani
Mohmmad Mehrdad Asadi
Sajjad Saffariyeh
Pedram Yousef
Emad Fatemizadeh
62
2
0
04 Nov 2023
ATGNN: Audio Tagging Graph Neural Network
Shubhr Singh
Christian J. Steinmetz
Emmanouil Benetos
Huy P Phan
Dan Stowell
ViT
GNN
52
9
0
02 Nov 2023
CAD -- Contextual Multi-modal Alignment for Dynamic AVQA
Asmar Nadeem
Adrian Hilton
R. Dawes
Graham A. Thomas
A. Mustafa
78
10
0
25 Oct 2023
Dynamic Convolutional Neural Networks as Efficient Pre-trained Audio Models
Florian Schmid
Khaled Koutini
Gerhard Widmer
42
11
0
24 Oct 2023
BeatDance: A Beat-Based Model-Agnostic Contrastive Learning Framework for Music-Dance Retrieval
Kaixing Yang
Xukun Zhou
Xulong Tang
Ran Diao
Hongyan Liu
Jun He
Zhaoxin Fan
66
3
0
16 Oct 2023
A cry for help: Early detection of brain injury in newborns
Charles C. Onu
Samantha Latremouille
Arsenii Gorin
Junhao Wang
Innocent Udeogu
...
O. Kehinde
Muhammad A. Salisu
Datonye Briggs
Yoshua Bengio
Doina Precup
115
2
0
12 Oct 2023
Cross-modal Cognitive Consensus guided Audio-Visual Segmentation
Zhaofeng Shi
Qingbo Wu
Fanman Meng
Linfeng Xu
Hongliang Li
VOS
102
3
0
10 Oct 2023
Audio Event-Relational Graph Representation Learning for Acoustic Scene Classification
Yuanbo Hou
Siyang Song
Chuang Yu
Wenwu Wang
Dick Botteldooren
55
7
0
05 Oct 2023
Prompting Audios Using Acoustic Properties For Emotion Representation
Hira Dhamyal
Benjamin Elizalde
Soham Deshmukh
Huaming Wang
Bhiksha Raj
Rita Singh
52
4
0
03 Oct 2023
GASS: Generalizing Audio Source Separation with Large-scale Data
Jordi Pons
Xiaoyu Liu
Santiago Pascual
Joan Serrà
71
12
0
29 Sep 2023
Improving Audio Captioning Models with Fine-grained Audio Features, Text Embedding Supervision, and LLM Mix-up Augmentation
Shih-Lun Wu
Xuankai Chang
Gordon Wichern
Jee-weon Jung
Franccois G. Germain
Jonathan Le Roux
Shinji Watanabe
78
20
0
29 Sep 2023
Bringing the Discussion of Minima Sharpness to the Audio Domain: a Filter-Normalised Evaluation for Acoustic Scene Classification
M. Milling
Andreas Triantafyllopoulos
Iosif Tsangko
Simon Rampp
F. Schlüter
111
3
0
28 Sep 2023
Semantic Proximity Alignment: Towards Human Perception-consistent Audio Tagging by Aligning with Label Text Description
Youbin Jeon
Yanzhen Ren
VLM
76
0
0
28 Sep 2023
Audio classification with Dilated Convolution with Learnable Spacings
Ismail Khalfaoui-Hassani
T. Masquelier
Thomas Pellegrini
69
1
0
25 Sep 2023
Attention Is All You Need For Blind Room Volume Estimation
Chunxiu Wang
Mao-shen Jia
Meiran Li
C. Bao
Wenyu Jin
52
7
0
23 Sep 2023
Towards Lexical Analysis of Dog Vocalizations via Online Videos
Yufei Wang
Chunhao Zhang
Jieyi Huang
Mengyue Wu
Ke Zhu
39
1
0
21 Sep 2023
Does My Dog ''Speak'' Like Me? The Acoustic Correlation between Pet Dogs and Their Human Owners
Jieyi Huang
Chunhao Zhang
Yufei Wang
Mengyue Wu
Ke Zhu
28
0
0
21 Sep 2023
Weakly-supervised Automated Audio Captioning via text only training
Theodoros Kouzelis
Vassilis Katsouros
CLIP
82
7
0
21 Sep 2023
TMac: Temporal Multi-Modal Graph Learning for Acoustic Event Classification
Meng Liu
K. Liang
Dayu Hu
Hao Yu
Yue Liu
Lingyuan Meng
Wenxuan Tu
Sihang Zhou
Xinwang Liu
74
26
0
21 Sep 2023
A Large-scale Dataset for Audio-Language Representation Learning
Luoyi Sun
Xuenan Xu
Mengyue Wu
Weidi Xie
87
27
0
20 Sep 2023
ConsistencyTTA: Accelerating Diffusion-Based Text-to-Audio Generation with Consistency Distillation
Yatong Bai
Trung D. Q. Dang
Dung N. Tran
K. Koishida
Somayeh Sojoudi
DiffM
162
23
0
19 Sep 2023
Synth-AC: Enhancing Audio Captioning with Synthetic Supervision
Feiyang Xiao
Qiaoxi Zhu
Jian Guan
Xubo Liu
Haohe Liu
Kejia Zhang
Wenwu Wang
59
2
0
18 Sep 2023
Contrastive Latent Space Reconstruction Learning for Audio-Text Retrieval
Kaiyi Luo
Xulong Zhang
Jianzong Wang
Huaxiong Li
Ning Cheng
Jing Xiao
109
2
0
16 Sep 2023
Audio-free Prompt Tuning for Language-Audio Models
Yiming Li
Xiangdong Wang
Hong Liu
CLIP
VLM
74
10
0
15 Sep 2023
Audio Difference Learning for Audio Captioning
Tatsuya Komatsu
Yusuke Fujita
K. Takeda
Tomoki Toda
76
4
0
15 Sep 2023
SSL-Net: A Synergistic Spectral and Learning-based Network for Efficient Bird Sound Classification
Yiyuan Yang
Kaichen Zhou
Niki Trigoni
Andrew Markham
61
5
0
15 Sep 2023
Retrieval-Augmented Text-to-Audio Generation
Yiitan Yuan
Haohe Liu
Xubo Liu
Qiushi Huang
Mark D. Plumbley
Wenwu Wang
RALM
80
28
0
14 Sep 2023
Training Audio Captioning Models without Audio
Soham Deshmukh
Benjamin Elizalde
Dimitra Emmanouilidou
Bhiksha Raj
Rita Singh
Huaming Wang
61
20
0
14 Sep 2023
Leveraging Foundation models for Unsupervised Audio-Visual Segmentation
Swapnil Bhosale
Haosen Yang
Diptesh Kanojia
Xiatian Zhu
VOS
59
5
0
13 Sep 2023
Multimodal Fish Feeding Intensity Assessment in Aquaculture
Meng Cui
Xubo Liu
Haohe Liu
Zhuangzhuang Du
Tao Chen
Guoping Lian
Daoliang Li
Wenwu Wang
79
5
0
10 Sep 2023
NADiffuSE: Noise-aware Diffusion-based Model for Speech Enhancement
Wen Wang
Dongchao Yang
Qichen Ye
Bowen Cao
Yuexian Zou
DiffM
86
3
0
03 Sep 2023
CoNeTTE: An efficient Audio Captioning system leveraging multiple datasets with Task Embedding
Etienne Labbé
Thomas Pellegrini
J. Pinquier
79
14
0
01 Sep 2023
General Purpose Audio Effect Removal
Matthew Rice
C. Steinmetz
Georgy Fazekas
Joshua D. Reiss
73
8
0
30 Aug 2023
AGS: An Dataset and Taxonomy for Domestic Scene Sound Event Recognition
Nan Che
Chenrui Liu
Fei Yu
62
0
0
30 Aug 2023
Sparks of Large Audio Models: A Survey and Outlook
S. Latif
Moazzam Shoukat
Fahad Shamshad
Muhammad Usama
Yi Ren
...
Wenwu Wang
Xulong Zhang
Roberto Togneri
Min Zhang
Björn W. Schuller
LM&MA
AuLLM
188
39
0
24 Aug 2023
Previous
1
2
3
4
5
6
...
9
10
11
Next