Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1912.10211
Cited By
v1
v2
v3
v4
v5 (latest)
PANNs: Large-Scale Pretrained Audio Neural Networks for Audio Pattern Recognition
21 December 2019
Qiuqiang Kong
Yin Cao
Turab Iqbal
Yuxuan Wang
Wenwu Wang
Mark D. Plumbley
VLM
SSL
Re-assign community
ArXiv (abs)
PDF
HTML
Github (1475★)
Papers citing
"PANNs: Large-Scale Pretrained Audio Neural Networks for Audio Pattern Recognition"
50 / 545 papers shown
Title
Video-Guided Text-to-Music Generation Using Public Domain Movie Collections
Haven Kim
Cheng-i Wang
Weihan Xu
Julian McAuley
Hao-Wen Dong
VGen
38
0
0
01 Jul 2025
A multi-stage augmented multimodal interaction network for fish feeding intensity quantification
Shulong Zhang
Mingyuan Yao
Jiayin Zhao
Xiao Liu
Haihua Wang
16
0
0
17 Jun 2025
Stereo sound event localization and detection based on PSELDnet pretraining and BiMamba sequence modeling
Wenmiao Gao
Yang Xiao
Mamba
28
0
0
16 Jun 2025
SSLAM: Enhancing Self-Supervised Models with Audio Mixtures for Polyphonic Soundscapes
Tony Alex
S. Ahmed
A. Mustafa
Muhammad Awais
Philip J. B. Jackson
12
1
0
13 Jun 2025
Average Calibration Losses for Reliable Uncertainty in Medical Image Segmentation
Theodore Barfoot
Luis C. Garcia-Peraza-Herrera
Samet Akcay
Ben Glocker
Tom Vercauteren
UQCV
132
0
0
04 Jun 2025
In-the-wild Audio Spatialization with Flexible Text-guided Localization
Tianrui Pan
Jie Liu
Z. Huang
Jie Tang
Gangshan Wu
42
0
0
01 Jun 2025
Patient-Aware Feature Alignment for Robust Lung Sound Classification:Cohesion-Separation and Global Alignment Losses
Seung Gyu Jeong
Seong-Eun Kim
OOD
23
0
0
28 May 2025
Improving Respiratory Sound Classification with Architecture-Agnostic Knowledge Distillation from Ensembles
Miika Toikkanen
June-Woo Kim
31
0
0
28 May 2025
AudioTurbo: Fast Text-to-Audio Generation with Rectified Diffusion
Junqi Zhao
Jinzheng Zhao
Haohe Liu
Yun Chen
Lu Han
Xubo Liu
Mark D. Plumbley
Wenwu Wang
DiffM
38
0
0
28 May 2025
Conditional Diffusion Models with Classifier-Free Gibbs-like Guidance
Badr Moufad
Yazid Janati
Alain Durmus
Ahmed Ghorbel
Eric Moulines
Jimmy Olsson
DiffM
69
0
0
27 May 2025
Hybrid Disagreement-Diversity Active Learning for Bioacoustic Sound Event Detection
Shiqi Zhang
Tuomas Virtanen
23
0
0
27 May 2025
Rhapsody: A Dataset for Highlight Detection in Podcasts
Younghan Park
Anuj Diwan
David Harwath
Eunsol Choi
46
0
0
26 May 2025
Self-supervised learning method using multiple sampling strategies for general-purpose audio representation
Ibuki Kuroyanagi
Tatsuya Komatsu
SSL
21
2
0
25 May 2025
Learning Normal Patterns in Musical Loops
Shayan Dadman
Bernt Arild Bremdal
Børre Bang
Rune Dalmo
29
0
0
22 May 2025
SpecMaskFoley: Steering Pretrained Spectral Masked Generative Transformer Toward Synchronized Video-to-audio Synthesis via ControlNet
Zhi-Wei Zhong
Akira Takahashi
Shuyang Cui
Keisuke Toyama
Shusuke Takahashi
Yuki Mitsufuji
VGen
58
0
0
22 May 2025
Discrete Audio Representations for Automated Audio Captioning
Jingguang Tian
Haoqin Sun
Xinhui Hu
Xinkang Xu
70
0
0
21 May 2025
Exploring the Potential of SSL Models for Sound Event Detection
Hanfang Cui
Longfei Song
Li Li
Dongxing Xu
Yanhua Long
85
0
0
17 May 2025
Improving Inference-Time Optimisation for Vocal Effects Style Transfer with a Gaussian Prior
Chin-Yun Yu
Marco A. Martínez-Ramírez
Junghyun Koo
Wei-Hsiang Liao
Yuki Mitsufuji
Gyorgy Fazekas
70
1
0
16 May 2025
Bridging Ears and Eyes: Analyzing Audio and Visual Large Language Models to Humans in Visible Sound Recognition and Reducing Their Sensory Gap via Cross-Modal Distillation
Xilin Jiang
Junkai Wu
Vishal B. Choudhari
N. Mesgarani
VLM
80
0
0
11 May 2025
Unleashing the Power of Natural Audio Featuring Multiple Sound Sources
Xize Cheng
Slytherin Wang
Zehan Wang
Rongjie Huang
Tao Jin
Zhou Zhao
66
0
0
24 Apr 2025
Exploring the User Experience of AI-Assisted Sound Searching Systems for Creative Workflows
Haohe Liu
Thomas Deacon
Wenwu Wang
Matt Paradis
Mark D. Plumbley
63
0
0
22 Apr 2025
Audio-Visual Class-Incremental Learning for Fish Feeding intensity Assessment in Aquaculture
Meng Cui
Xianghu Yue
Xinyuan Qian
Jinzheng Zhao
Haohe Liu
Xubo Liu
Daoliang Li
Wenwu Wang
133
0
0
21 Apr 2025
Transformation of audio embeddings into interpretable, concept-based representations
Alice Zhang
Edison Thomaz
Lie Lu
76
0
0
18 Apr 2025
A Survey on Cross-Modal Interaction Between Music and Multimodal Data
Sifei Li
Mining Tan
Feier Shen
Minyan Luo
Zijiao Yin
Fan Tang
W. Dong
Changsheng Xu
118
1
0
17 Apr 2025
Temporal Attention Pooling for Frequency Dynamic Convolution in Sound Event Detection
Hyeonuk Nam
Yong-Hwa Park
54
0
0
17 Apr 2025
Policy Optimization Algorithms in a Unified Framework
Shuang Wu
72
0
0
04 Apr 2025
DeepSound-V1: Start to Think Step-by-Step in the Audio Generation from Videos
Yunming Liang
Zihao Chen
Chaofan Ding
Xinhan Di
DiffM
VGen
109
0
0
28 Mar 2025
Hierarchical Label Propagation: A Model-Size-Dependent Performance Booster for AudioSet Tagging
Ludovic Tuncay
Etienne Labbé
Thomas Pellegrini
VLM
87
0
0
26 Mar 2025
Neural Edge Histogram Descriptors for Underwater Acoustic Target Recognition
Atharva Agashe
Davelle Carreiro
A. V. Dine
Joshua Peeples
69
0
0
17 Mar 2025
Comparative Study of Spike Encoding Methods for Environmental Sound Classification
Andres Larroza
Javier Naranjo-Alcazar
Vicent Ortiz Castelló
P. Zuccarello
185
0
0
14 Mar 2025
Exploring Performance-Complexity Trade-Offs in Sound Event Detection Models
T. Morocutti
Florian Schmid
Jonathan Greif
Francesco Foscarin
Gerhard Widmer
74
0
0
14 Mar 2025
AudioX: Diffusion Transformer for Anything-to-Audio Generation
Zeyue Tian
Yizhu Jin
Zhaoyang Liu
Ruibin Yuan
Xu Tan
Qifeng Chen
Wei Xue
Yu Guo
114
6
0
13 Mar 2025
Long-Video Audio Synthesis with Multi-Agent Collaboration
Yehang Zhang
Xinli Xu
Xiaojie Xu
L. Liu
Yuxiao Chen
DiffM
VGen
106
1
0
13 Mar 2025
TA-V2A: Textually Assisted Video-to-Audio Generation
Yuhuan You
Xihong Wu
T. Qu
DiffM
105
0
0
12 Mar 2025
Mellow: a small audio language model for reasoning
Soham Deshmukh
Satvik Dixit
Rita Singh
Bhiksha Raj
AuLLM
ReLM
LRM
111
4
0
11 Mar 2025
HarmonySet: A Comprehensive Dataset for Understanding Video-Music Semantic Alignment and Temporal Synchronization
Zitang Zhou
Ke Mei
Yu Lu
Tianyi Wang
Fengyun Rao
132
2
0
03 Mar 2025
CFSum: A Transformer-Based Multi-Modal Video Summarization Framework With Coarse-Fine Fusion
Yaowei Guo
Jiazheng Xing
Xiaojun Hou
Shuo Xin
Juntao Jiang
Demetri Terzopoulos
Chenfanfu Jiang
Yong Liu
ViT
71
0
0
01 Mar 2025
JiTTER: Jigsaw Temporal Transformer for Event Reconstruction for Self-Supervised Sound Event Detection
Hyeonuk Nam
Yong-Hwa Park
80
1
0
28 Feb 2025
DualSpec: Text-to-spatial-audio Generation via Dual-Spectrogram Guided Diffusion Model
Lei Zhao
Sizhou Chen
Linfeng Feng
Ju Liu
Xuelong Li
Chi Zhang
Xuelong Li
DiffM
MDE
110
1
0
26 Feb 2025
AAD-LLM: Neural Attention-Driven Auditory Scene Understanding
Xilin Jiang
Sukru Samet Dindar
Vishal B. Choudhari
Stephan Bickel
A. Mehta
Guy M McKhann
A. Flinker
D. Friedman
N. Mesgarani
112
2
0
24 Feb 2025
KAD: No More FAD! An Effective and Efficient Evaluation Metric for Audio Generation
Yoonjin Chung
Pilsun Eu
Junwon Lee
Keunwoo Choi
Juhan Nam
Ben Sangbae Chon
EGVM
107
4
0
21 Feb 2025
Keep what you need : extracting efficient subnetworks from large audio representation models
David Genova
P. Esling
Tom Hurlin
117
0
0
18 Feb 2025
Step-Audio: Unified Understanding and Generation in Intelligent Speech Interaction
Ailin Huang
Boyong Wu
Bruce Wang
Chao Yan
Chen Hu
...
Tianyu Wang
Wenjin Deng
Wuxun Xie
Weipeng Ming
Wenqing He
AuLLM
121
17
0
17 Feb 2025
NOTA: Multimodal Music Notation Understanding for Visual Large Language Model
Mingni Tang
Jiajia Li
Lu Yang
Zhiqiang Zhang
Jinghao Tian
Zehan Li
Lefei Zhang
Peijie Wang
90
0
0
17 Feb 2025
Music for All: Representational Bias and Cross-Cultural Adaptability of Music Generation Models
Atharva Mehta
Shivam Chauhan
Amirbek Djanibekov
Atharva Kulkarni
Gus Xia
Monojit Choudhury
166
0
0
11 Feb 2025
Leveraging Broadcast Media Subtitle Transcripts for Automatic Speech Recognition and Subtitling
Jakob Poncelet
Hugo Van hamme
147
0
0
05 Feb 2025
Audio-Language Models for Audio-Centric Tasks: A survey
Yi Su
Jisheng Bai
Qisheng Xu
Kele Xu
Yong Dou
AuLLM
164
4
0
28 Jan 2025
Hybrid Losses for Hierarchical Embedding Learning
Haokun Tian
Stefan Lattner
Brian McFee
Charalampos Saitis
77
0
0
22 Jan 2025
Noise-Agnostic Multitask Whisper Training for Reducing False Alarm Errors in Call-for-Help Detection
Myeonghoon Ryu
June-Woo Kim
Minseok Oh
Suji Lee
Han Park
122
0
0
20 Jan 2025
LD-DETR: Loop Decoder DEtection TRansformer for Video Moment Retrieval and Highlight Detection
Pengcheng Zhao
Zhixian He
Fuwei Zhang
Shujin Lin
Fan Zhou
136
2
0
18 Jan 2025
1
2
3
4
...
9
10
11
Next