Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1912.10211
Cited By
v1
v2
v3
v4
v5 (latest)
PANNs: Large-Scale Pretrained Audio Neural Networks for Audio Pattern Recognition
21 December 2019
Qiuqiang Kong
Yin Cao
Turab Iqbal
Yuxuan Wang
Wenwu Wang
Mark D. Plumbley
VLM
SSL
Re-assign community
ArXiv (abs)
PDF
HTML
Github (1475★)
Papers citing
"PANNs: Large-Scale Pretrained Audio Neural Networks for Audio Pattern Recognition"
50 / 545 papers shown
Title
Editing Music with Melody and Text: Using ControlNet for Diffusion Transformer
Siyuan Hou
Shansong Liu
Ruibin Yuan
Wei Xue
Ying Shan
Mangsuo Zhao
Chao Zhang
143
6
0
17 Jan 2025
FlowSep: Language-Queried Sound Separation with Rectified Flow Matching
Yi Yuan
Xubo Liu
Haohe Liu
Mark D. Plumbley
Wenwu Wang
140
9
0
10 Jan 2025
Audio-Language Datasets of Scenes and Events: A Survey
Gijs Wijngaard
Elia Formisano
Michele Esposito
M. Dumontier
185
3
0
10 Jan 2025
OmniChat: Enhancing Spoken Dialogue Systems with Scalable Synthetic Data for Diverse Scenarios
Xize Cheng
Dongjie Fu
Xiaoda Yang
Minghui Fang
Ruofan Hu
...
Rongjie Huang
Linjun Li
Yu Chen
Tao Jin
Zhou Zhao
123
1
0
03 Jan 2025
Sound-Based Recognition of Touch Gestures and Emotions for Enhanced Human-Robot Interaction
Yuanbo Hou
Qiaoqiao Ren
Wenwu Wang
Dick Botteldooren
87
0
0
03 Jan 2025
Length-Aware DETR for Robust Moment Retrieval
Sangkwon Park
Jiho Choi
Kyungjune Baek
Hyunjung Shim
81
0
0
31 Dec 2024
LoVA: Long-form Video-to-Audio Generation
Xin Cheng
Xihua Wang
Yihan Wu
Yuyue Wang
Ruihua Song
VGen
DiffM
115
3
0
31 Dec 2024
Smooth-Foley: Creating Continuous Sound for Video-to-Audio Generation Under Semantic Guidance
Yaoyun Zhang
Xuenan Xu
Mengyue Wu
VGen
97
1
0
24 Dec 2024
MMAudio: Taming Multimodal Joint Training for High-Quality Video-to-Audio Synthesis
Ho Kei Cheng
Masato Ishii
Akio Hayakawa
Takashi Shibuya
Alex Schwing
Yuki Mitsufuji
VGen
294
18
0
19 Dec 2024
Query-centric Audio-Visual Cognition Network for Moment Retrieval, Segmentation and Step-Captioning
Yunbin Tu
Liang-Sheng Li
Li Su
Qingming Huang
112
0
0
18 Dec 2024
autrainer: A Modular and Extensible Deep Learning Toolkit for Computer Audition Tasks
Simon Rampp
Andreas Triantafyllopoulos
M. Milling
Björn Schuller
270
0
0
16 Dec 2024
When Vision Models Meet Parameter Efficient Look-Aside Adapters Without Large-Scale Audio Pretraining
Juan Yeo
Jinkwan Jang
Kyubyung Chae
Seongkyu Mun
Taesup Kim
VLM
133
0
0
08 Dec 2024
A Survey of Recent Advances and Challenges in Deep Audio-Visual Correlation Learning
Luis Vilaca
Yi Yu
Paula Vinan
186
0
0
24 Nov 2024
Investigating the Effectiveness of Explainability Methods in Parkinson's Detection from Speech
Eleonora Mancini
Francesco Paissan
Paolo Torroni
Mirco Ravanelli
Cem Subakan
83
2
0
12 Nov 2024
PSELDNets: Pre-trained Neural Networks on a Large-scale Synthetic Dataset for Sound Event Localization and Detection
Jinbo Hu
Yin Cao
Ming Wu
Fang Kang
Feiran Yang
Wenwu Wang
Mark D. Plumbley
J. Yang
68
1
0
10 Nov 2024
Does the Definition of Difficulty Matter? Scoring Functions and their Role for Curriculum Learning
Simon Rampp
M. Milling
Andreas Triantafyllopoulos
Björn Schuller
71
1
0
01 Nov 2024
MACE: Leveraging Audio for Evaluating Audio Captioning Systems
Satvik Dixit
Soham Deshmukh
Bhiksha Raj
59
1
0
01 Nov 2024
Retrieval-Augmented Approach for Unsupervised Anomalous Sound Detection and Captioning without Model Training
Ryoya Ogura
Tomoya Nishida
Yohei Kawaguchi
24
1
0
29 Oct 2024
Timbre Difference Capturing in Anomalous Sound Detection
Tomoya Nishida
Harsh Purohit
Kota Dohi
Takashi Endo
Yohei Kawaguchi
56
0
0
29 Oct 2024
USpeech: Ultrasound-Enhanced Speech with Minimal Human Effort via Cross-Modal Synthesis
Luca Jiang-Tao Yu
Running Zhao
Sijie Ji
Edith C.H. Ngai
Chenshu Wu
52
0
0
29 Oct 2024
Knowledge Distillation for Real-Time Classification of Early Media in Voice Communications
Kemal Altwlkany
Hadžem Hadžić
Amar Kurić
Emanuel Lacic
VLM
19
0
0
28 Oct 2024
ST-ITO: Controlling Audio Effects for Style Transfer with Inference-Time Optimization
C. Steinmetz
Shubhr Singh
Marco Comunità
Ilias Ibnyahya
Shanxin Yuan
Emmanouil Benetos
Joshua Reiss
81
9
0
28 Oct 2024
MusicFlow: Cascaded Flow Matching for Text Guided Music Generation
K R Prajwal
Bowen Shi
Matthew Lee
Apoorv Vyas
Andros Tjandra
...
Baishan Guo
Huiyu Wang
Triantafyllos Afouras
David Kant
Wei-Ning Hsu
80
5
0
27 Oct 2024
Challenge on Sound Scene Synthesis: Evaluating Text-to-Audio Generation
Junwon Lee
Modan Tailleur
Laurie M. Heller
Keunwoo Choi
Mathieu Lagrange
Brian McFee
Keisuke Imoto
Yuki Okamoto
54
5
0
23 Oct 2024
MuVi: Video-to-Music Generation with Semantic Alignment and Rhythmic Synchronization
Ruiqi Li
Siqi Zheng
Xize Cheng
Ziang Zhang
Shengpeng Ji
Zhou Zhao
VGen
121
9
0
16 Oct 2024
Enhancing Robustness in Deep Reinforcement Learning: A Lyapunov Exponent Approach
Rory Young
Nicolas Pugeault
AAML
136
5
0
14 Oct 2024
SLAM-AAC: Enhancing Audio Captioning with Paraphrasing Augmentation and CLAP-Refine through LLMs
Wenxi Chen
Ziyang Ma
Xiquan Li
Xuenan Xu
Yuzhe Liang
Zhisheng Zheng
Kai Yu
Xie Chen
100
7
0
12 Oct 2024
DRCap: Decoding CLAP Latents with Retrieval-Augmented Generation for Zero-shot Audio Captioning
Xiquan Li
Wenxi Chen
Ziyang Ma
Xuenan Xu
Yuzhe Liang
Zhisheng Zheng
Qiuqiang Kong
Xie Chen
VLM
121
6
0
12 Oct 2024
Movie Trailer Genre Classification Using Multimodal Pretrained Features
Serkan Sulun
Paula Viana
M. Davies
CLIP
74
3
0
11 Oct 2024
A Recurrent Neural Network Approach to the Answering Machine Detection Problem
Kemal Altwlkany
Sead Delalic
Elmedin Selmanovic
Adis Alihodzic
Ivica Lovric
32
0
0
07 Oct 2024
Pre-training with Synthetic Patterns for Audio
Yuchi Ishikawa
Tatsuya Komatsu
Yoshimitsu Aoki
58
0
0
01 Oct 2024
InfantCryNet: A Data-driven Framework for Intelligent Analysis of Infant Cries
Mengze Hong
Chen Jason Zhang
Lingxiao Yang
Yuanfeng Song
Di Jiang
86
2
0
29 Sep 2024
MT2KD: Towards A General-Purpose Encoder for Speech, Speaker, and Audio Events
Xiaoyu Yang
Qiujia Li
Chao Zhang
P. Woodland
121
1
0
25 Sep 2024
Self-Supervised Audio-Visual Soundscape Stylization
Tingle Li
Renhao Wang
Po-Yao Huang
Andrew Owens
Gopala Anumanchipalli
DiffM
SSL
98
5
0
22 Sep 2024
Generalization in birdsong classification: impact of transfer learning methods and dataset characteristics
Burooj Ghani
Vincent J. Kalkman
Bob Planqué
Willem-Pier Vellinga
L. Gill
Dan Stowell
VLM
69
5
0
21 Sep 2024
SoundBeam meets M2D: Target Sound Extraction with Audio Foundation Model
Carlos Hernandez-Olivan
Marc Delcroix
Tsubasa Ochiai
Daisuke Niizumi
Naohiro Tawara
Tomohiro Nakatani
Shoko Araki
54
2
0
19 Sep 2024
AudioEditor: A Training-Free Diffusion-Based Audio Editing Framework
Yuhang Jia
Yang Chen
Jinghua Zhao
Shiwan Zhao
Wenjia Zeng
Yong Chen
Yong Qin
DiffM
58
1
0
19 Sep 2024
The Sounds of Home: A Speech-Removed Residential Audio Dataset for Sound Event Detection
Gabriel Bibbó
Thomas Deacon
Arshdeep Singh
Mark D. Plumbley
26
0
0
17 Sep 2024
Machine listening in a neonatal intensive care unit
Modan Tailleur
Vincent Lostanlen
Jean-Philippe Riviere
Pierre Aumond
55
0
0
16 Sep 2024
DiffATR: Diffusion-based Generative Modeling for Audio-Text Retrieval
Yifei Xin
Xuxin Cheng
Zhihong Zhu
Xusheng Yang
Yuexian Zou
DiffM
93
5
0
16 Sep 2024
Extract and Diffuse: Latent Integration for Improved Diffusion-based Speech and Vocal Enhancement
Yudong Yang
Zhan Liu
Wenyi Yu
Guangzhi Sun
Qiuqiang Kong
Chao Zhang
DiffM
104
1
0
15 Sep 2024
A Survey of Foundation Models for Music Understanding
Wenjun Li
Ying Cai
Ziyang Wu
Wenyi Zhang
Yifan Chen
...
Junwei Han
Bao Ge
Tianming Liu
Lin Gan
Tuo Zhang
120
2
0
15 Sep 2024
Effective Pre-Training of Audio Transformers for Sound Event Detection
Florian Schmid
T. Morocutti
Francesco Foscarin
Jan Schluter
Paul Primus
Gerhard Widmer
ViT
65
2
0
14 Sep 2024
Improvements of Discriminative Feature Space Training for Anomalous Sound Detection in Unlabeled Conditions
Takuya Fujimura
Ibuki Kuroyanagi
Tomoki Toda
34
1
0
14 Sep 2024
LMAC-TD: Producing Time Domain Explanations for Audio Classifiers
Eleonora Mancini
Francesco Paissan
Mirco Ravanelli
Cem Subakan
60
2
0
13 Sep 2024
MambaFoley: Foley Sound Generation using Selective State-Space Models
Marco Furio Colombo
Francesca Ronchini
Luca Comanducci
Fabio Antonacci
Mamba
112
1
0
13 Sep 2024
Exploring Differences between Human Perception and Model Inference in Audio Event Recognition
Yizhou Tan
Yanru Wu
Yuanbo Hou
Xin Xu
Hui Bu
Shengchen Li
Dick Botteldooren
Mark D. Plumbley
58
0
0
10 Sep 2024
WaveTransfer: A Flexible End-to-end Multi-instrument Timbre Transfer with Diffusion
Teysir Baoueb
Xiaoyu Bie
Hicham Janati
Gaël Richard
DiffM
46
1
0
06 Sep 2024
Codec Does Matter: Exploring the Semantic Shortcoming of Codec for Audio Language Model
Zhen Ye
Peiwen Sun
Jiahe Lei
Hongzhan Lin
Xu Tan
...
Jianyi Chen
Jiahao Pan
Qifeng Liu
Yike Guo
Wei Xue
AuLLM
69
19
0
30 Aug 2024
Towards reliable respiratory disease diagnosis based on cough sounds and vision transformers
Qian Wang
Zhaoyang Bu
Jiaxuan Mao
Wenyu Zhu
Jingya Zhao
Wei Du
Guochao Shi
Min Zhou
Si Chen
Jieming Qu
MedIm
68
0
0
28 Aug 2024
Previous
1
2
3
4
5
...
9
10
11
Next