v1v2v3v4v5 (latest)

PANNs: Large-Scale Pretrained Audio Neural Networks for Audio Pattern Recognition

21 December 2019

Yuxuan Wang

ArXiv (abs)PDF HTML Github (1475★)

Papers citing "PANNs: Large-Scale Pretrained Audio Neural Networks for Audio Pattern Recognition"

50 / 545 papers shown

Title
HTS-AT: A Hierarchical Token-Semantic Audio Transformer for Sound Classification and Detection Ke Chen Xingjian Du Bilei Zhu Zejun Ma Taylor Berg-Kirkpatrick Shlomo Dubnov ViT 169 277 0 02 Feb 2022
Self-supervised Graphs for Audio Representation Learning with Limited Labeled Data A. Shirian Krishna Somandepalli T. Guha SSL 132 10 0 31 Jan 2022
Anomalous Sound Detection using Spectral-Temporal Information Fusion Youde Liu Jian Guan Qiaoxi Zhu Wenwu Wang 67 58 0 14 Jan 2022
Local Information Assisted Attention-free Decoder for Audio Captioning Feiyang Xiao Jian Guan Haiyan Lan Qiaoxi Zhu Wenwu Wang 98 11 0 10 Jan 2022
An Ensemble of Deep Learning Frameworks Applied For Predicting Respiratory Anomalies L. D. Pham Dat Ngo T. Hoang Alexander Schindler Ian Mcloughlin 69 5 0 09 Jan 2022
Detect what you want: Target Sound Detection Dongchao Yang Helin Wang Yuexian Zou Fan Cui Chao Weng 95 7 0 19 Dec 2021
Audio Retrieval with Natural Language Queries: A Benchmark Study A. Sophia Koepke Andreea-Maria Oncescu João F. Henriques Zeynep Akata Samuel Albanie 76 102 0 17 Dec 2021
MIDI-DDSP: Detailed Control of Musical Performance via Hierarchical Modeling Yusong Wu Ethan Manilow Yi Deng Rigel Swavely Kyle Kastner Tim Cooijmans Aaron Courville Cheng-Zhi Anna Huang Jesse Engel 87 45 0 17 Dec 2021
Chimpanzee voice prints? Insights from transfer learning experiments from human voices Maël Leroux Orestes Uxio Gutierrez Al-Khudhairy N. Perony S. Townsend 16 7 0 15 Dec 2021
Zero-shot Audio Source Separation through Query-based Learning from Weakly-labeled Data Ke Chen Xingjian Du Bilei Zhu Zejun Ma Taylor Berg-Kirkpatrick Shlomo Dubnov 109 46 0 15 Dec 2021
Towards Learning Universal Audio Representations Luyu Wang Pauline Luc Yan Wu Adrià Recasens Lucas Smaira ... Andrew Jaegle Jean-Baptiste Alayrac Sander Dieleman João Carreira Aaron van den Oord SSL 124 71 0 23 Nov 2021
Effect of noise suppression losses on speech distortion and ASR performance Sebastian Braun H. Gamper 56 21 0 23 Nov 2021
SALSA-Lite: A Fast and Effective Feature for Polyphonic Sound Event Localization and Detection with Microphone Arrays Thi Ngoc Tho Nguyen Douglas L. Jones Karn N. Watcharasupat Huy P Phan W. Gan 70 37 0 16 Nov 2021
Who calls the shots? Rethinking Few-Shot Learning for Audio Yu Wang Nicholas J. Bryan Justin Salamon M. Cartwright J. P. Bello VLM 130 25 0 18 Oct 2021
Conformer-Based Self-Supervised Learning for Non-Speech Audio Tasks Sangeeta Srivastava Yun Wang Andros Tjandra Anurag Kumar Chunxi Liu Kritika Singh Yatharth Saraf SSL 99 25 0 14 Oct 2021
Diverse Audio Captioning via Adversarial Training Xinhao Mei Xubo Liu Jianyuan Sun Mark D. Plumbley Wenwu Wang DiffM GAN 102 28 0 13 Oct 2021
Multistage linguistic conditioning of convolutional layers for speech emotion recognition Andreas Triantafyllopoulos U. Reichel Shuo Liu Simon Huber F. Eyben Björn W. Schuller 90 11 0 13 Oct 2021
Improving the Performance of Automated Audio Captioning via Integrating the Acoustic and Semantic Information Zhongjie Ye Helin Wang Dongchao Yang Yuexian Zou 98 28 0 12 Oct 2021
$Pano-AVQA: Grounded Audio-Visual Question Answering on 360$^\circ$ Videos$ Pano-AVQA: Grounded Audio-Visual Question Answering on 360 $^\circ$ Videos Heeseung Yun Youngjae Yu Wonsuk Yang Kangil Lee Gunhee Kim 100 86 0 11 Oct 2021
Efficient Training of Audio Transformers with Patchout Khaled Koutini Jan Schluter Hamid Eghbalzadeh Gerhard Widmer ViT 167 262 0 11 Oct 2021
Can Audio Captions Be Evaluated with Image Caption Metrics? Zelin Zhou Zhiling Zhang Xuenan Xu Zeyu Xie Mengyue Wu Kenny Q. Zhu 68 46 0 10 Oct 2021
A Mutual learning framework for Few-shot Sound Event Detection Dongchao Yang Helin Wang Yuexian Zou Zhongjie Ye Wenwu Wang 145 26 0 09 Oct 2021
MusicNet: Compact Convolutional Neural Network for Real-time Background Music Detection Chandan K. A. Reddy Vishak Gopa Harishchandra Dubey Sergiy Matusevych Ross Cutler R. Aichner 42 0 0 08 Oct 2021
A study of the robustness of raw waveform based speaker embeddings under mismatched conditions Ge Zhu Frank Cwitkowitz Z. Duan 55 2 0 08 Oct 2021
Transferring Voice Knowledge for Acoustic Event Detection: An Empirical Study Dawei Liang Yangyang Shi Yun Wang Nayan Singhal Alex Xiao Jonathan Shaw Edison Thomaz Ozlem Kalinli M. Seltzer 37 4 0 07 Oct 2021
Fairness and underspecification in acoustic scene classification: The case for disaggregated evaluations Andreas Triantafyllopoulos M. Milling Konstantinos Drossos Björn W. Schuller 60 7 0 04 Oct 2021
Enriching Ontology with Temporal Commonsense for Low-Resource Audio Tagging Zhiling Zhang Zelin Zhou Haifeng Tang Guangwei Li Mengyue Wu Kenny Q. Zhu 119 4 0 03 Oct 2021
SALSA: Spatial Cue-Augmented Log-Spectrogram Features for Polyphonic Sound Event Localization and Detection Thi Ngoc Tho Nguyen Karn N. Watcharasupat Ngoc Khanh Nguyen Douglas L. Jones W. Gan 62 49 0 01 Oct 2021
SpliceOut: A Simple and Efficient Audio Augmentation Method Arjit Jain Pranay Reddy Samala Deepak Mittal Preethi Jyothi M. Singh 130 11 0 30 Sep 2021
Complementing Handcrafted Features with Raw Waveform Using a Light-weight Auxiliary Model Zhongwei Teng Quchen Fu Jules White Maria E. Powell Douglas C. Schmidt 46 5 0 06 Sep 2021
Parsing Birdsong with Deep Audio Embeddings Irina Tolkova Brian Chu Marcel Hedman Stefan Kahl Holger Klinck 56 11 0 20 Aug 2021
RW-Resnet: A Novel Speech Anti-Spoofing Model Using Raw Waveform Youxuan Ma Zongze Ren Shugong Xu 83 40 0 12 Aug 2021
Automated Audio Captioning using Transfer Learning and Reconstruction Latent Space Similarity Regularization Andrew Koh Fuzhao Xue Chng Eng Siong 68 20 0 10 Aug 2021
The EIHW-GLAM Deep Attentive Multi-model Fusion System for Cough-based COVID-19 Recognition in the DiCOVA 2021 Challenge Zhao Ren Yi Chang Björn W. Schuller 59 0 0 06 Aug 2021
An Encoder-Decoder Based Audio Captioning System With Transfer and Reinforcement Learning Xinhao Mei Qiushi Huang Xubo Liu Gengyun Chen Jingqian Wu ... Tom Ko H. Tang Xingkun Shao Mark D. Plumbley Wenwu Wang 91 54 0 05 Aug 2021
DarkGAN: Exploiting Knowledge Distillation for Comprehensible Audio Synthesis with GANs J. Nistal Stefan Lattner G. Richard 74 9 0 03 Aug 2021
Improving Polyphonic Sound Event Detection on Multichannel Recordings with the Sørensen-Dice Coefficient Loss and Transfer Learning Karn N. Watcharasupat Thi Ngoc Tho Nguyen Ngoc Khanh Nguyen Zhen Jian Lee Douglas L. Jones W. Gan 116 0 0 22 Jul 2021
Audio Captioning Transformer Xinhao Mei Xubo Liu Qiushi Huang Mark D. Plumbley Wenwu Wang ViT 94 78 0 21 Jul 2021
A Multimodal Machine Learning Framework for Teacher Vocal Delivery Evaluation Hang Li Yunxing Kang Y. Hao Wenbiao Ding Zhongqin Wu Zitao Liu 40 4 0 15 Jul 2021
Weakly-Supervised Classification and Detection of Bird Sounds in the Wild. A BirdCLEF 2021 Solution Marcos V. Conde K. Shubham Prateek Agnihotri N. D. Movva S. Bessenyei 22 15 0 10 Jul 2021
Multi-modal Affect Analysis using standardized data within subjects in the Wild Sachihiro Youoku Takahisa Yamamoto Junya Saito A. Uchida Xiaoyue Mi Ziqiang Shi Liu Liu Zhongling Liu Osafumi Nakayama Kentaro Murase CVBM 58 6 0 07 Jul 2021
TENET: A Time-reversal Enhancement Network for Noise-robust ASR Fu-An Chao Shao-Wei Fan-Jiang Bi-Cheng Yan J. Hung Berlin Chen 60 13 0 04 Jul 2021
Improving Sound Event Classification by Increasing Shift Invariance in Convolutional Neural Networks Eduardo Fonseca Andrés Ferraro Xavier Serra AI4TS 131 9 0 01 Jul 2021
DCASE 2021 Task 3: Spectrotemporally-aligned Features for Polyphonic Sound Event Localization and Detection Thi Ngoc Tho Nguyen Karn N. Watcharasupat Ngoc Khanh Nguyen Douglas L. Jones W. Gan 70 16 0 29 Jun 2021
Do sound event representations generalize to other audio tasks? A case study in audio transfer learning Anurag Kumar Yun Wang V. Ithapu Christian Fuegen 76 3 0 21 Jun 2021
Deep Learning Frameworks Applied For Audio-Visual Scene Classification L. D. Pham Alexander Schindler Mina Schütz Jasmin Lampert S. Schlarb Ross King 57 9 0 12 Jun 2021
Anomalous Sound Detection Using a Binary Classification Model and Class Centroids Ibuki Kuroyanagi Tomoki Hayashi K. Takeda Tomoki Toda 36 8 0 11 Jun 2021
PARP: Prune, Adjust and Re-Prune for Self-Supervised Speech Recognition Cheng-I Jeff Lai Yang Zhang Alexander H. Liu Shiyu Chang Yi-Lun Liao Yung-Sung Chuang Kaizhi Qian Sameer Khurana David D. Cox James R. Glass VLM 162 78 0 10 Jun 2021
ERANNs: Efficient Residual Audio Neural Networks for Audio Pattern Recognition S. Verbitskiy Vladimir Berikov Viacheslav Vyshegorodtsev 109 75 0 03 Jun 2021
Unsupervised Multi-Target Domain Adaptation for Acoustic Scene Classification Dongchao Yang Helin Wang Yuexian Zou 26 5 0 21 May 2021