ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1912.10211
  4. Cited By
PANNs: Large-Scale Pretrained Audio Neural Networks for Audio Pattern
  Recognition
v1v2v3v4v5 (latest)

PANNs: Large-Scale Pretrained Audio Neural Networks for Audio Pattern Recognition

21 December 2019
Qiuqiang Kong
Yin Cao
Turab Iqbal
Yuxuan Wang
Wenwu Wang
Mark D. Plumbley
    VLMSSL
ArXiv (abs)PDFHTMLGithub (1475★)

Papers citing "PANNs: Large-Scale Pretrained Audio Neural Networks for Audio Pattern Recognition"

45 / 545 papers shown
Title
Sound Event Detection with Adaptive Frequency Selection
Sound Event Detection with Adaptive Frequency Selection
Zhepei Wang
Jonah Casebeer
Adam Clemmitt
Efthymios Tzinis
Paris Smaragdis
52
2
0
17 May 2021
The Benefit Of Temporally-Strong Labels In Audio Event Classification
The Benefit Of Temporally-Strong Labels In Audio Event Classification
Shawn Hershey
D. Ellis
Eduardo Fonseca
A. Jansen
Caroline Liu
Channing Moore
Manoj Plakal
81
106
0
14 May 2021
Audio Captioning with Composition of Acoustic and Semantic Information
Audio Captioning with Composition of Acoustic and Semantic Information
Aysegül Özkaya Eren
M. Sert
63
3
0
13 May 2021
Voice activity detection in the wild: A data-driven approach using
  teacher-student training
Voice activity detection in the wild: A data-driven approach using teacher-student training
Heinrich Dinkel
Shuai Wang
Xuenan Xu
Mengyue Wu
K. Yu
VLM
40
33
0
10 May 2021
Audio Retrieval with Natural Language Queries
Audio Retrieval with Natural Language Queries
Andreea-Maria Oncescu
A. Sophia Koepke
João F. Henriques
Zeynep Akata
Samuel Albanie
63
79
0
05 May 2021
Self-Supervised Learning from Automatically Separated Sound Scenes
Self-Supervised Learning from Automatically Separated Sound Scenes
Eduardo Fonseca
A. Jansen
D. Ellis
Scott Wisdom
Marco Tagliasacchi
J. Hershey
Manoj Plakal
Shawn Hershey
R. C. Moore
Xavier Serra
SSL
81
13
0
05 May 2021
Shot Contrastive Self-Supervised Learning for Scene Boundary Detection
Shot Contrastive Self-Supervised Learning for Scene Boundary Detection
Shixing Chen
Xiaohan Nie
David D. Fan
Dongqing Zhang
Vimal Bhat
Raffay Hamid
SSL
77
62
0
28 Apr 2021
Multimodal Self-Supervised Learning of General Audio Representations
Multimodal Self-Supervised Learning of General Audio Representations
Luyu Wang
Pauline Luc
Adrià Recasens
Jean-Baptiste Alayrac
Aaron van den Oord
SSL
137
41
0
26 Apr 2021
The Influence of Audio on Video Memorability with an Audio Gestalt
  Regulated Video Memorability System
The Influence of Audio on Video Memorability with an Audio Gestalt Regulated Video Memorability System
Lorin Sweeney
Graham Healy
Alan F. Smeaton
55
11
0
23 Apr 2021
VATT: Transformers for Multimodal Self-Supervised Learning from Raw
  Video, Audio and Text
VATT: Transformers for Multimodal Self-Supervised Learning from Raw Video, Audio and Text
Hassan Akbari
Liangzhe Yuan
Rui Qian
Wei-Hong Chuang
Shih-Fu Chang
Huayu Chen
Boqing Gong
ViT
366
594
0
22 Apr 2021
Distilling Audio-Visual Knowledge by Compositional Contrastive Learning
Distilling Audio-Visual Knowledge by Compositional Contrastive Learning
Yanbei Chen
Yongqin Xian
A. Sophia Koepke
Ying Shan
Zeynep Akata
147
83
0
22 Apr 2021
Efficient and Generic 1D Dilated Convolution Layer for Deep Learning
Efficient and Generic 1D Dilated Convolution Layer for Deep Learning
Narendra Chaudhary
Sanchit Misra
Dhiraj D. Kalamkar
A. Heinecke
E. Georganas
Barukh Ziv
Menachem Adelman
Bharat Kaul
44
9
0
16 Apr 2021
AST: Audio Spectrogram Transformer
AST: Audio Spectrogram Transformer
Yuan Gong
Yu-An Chung
James R. Glass
ViT
200
887
0
05 Apr 2021
An Audio-Based Deep Learning Framework For BBC Television Programme
  Classification
An Audio-Based Deep Learning Framework For BBC Television Programme Classification
L. D. Pham
C. Baume
Qiuqiang Kong
Tassadaq Hussain
Wenwu Wang
Mark D. Plumbley
165
4
0
02 Apr 2021
Broaden Your Views for Self-Supervised Video Learning
Broaden Your Views for Self-Supervised Video Learning
Adrià Recasens
Pauline Luc
Jean-Baptiste Alayrac
Luyu Wang
Ross Hemsley
...
Florent Altché
M. Valko
Jean-Bastien Grill
Aaron van den Oord
Andrew Zisserman
SSLAI4TS
137
128
0
30 Mar 2021
GISE-51: A scalable isolated sound events dataset
GISE-51: A scalable isolated sound events dataset
Sarthak Yadav
Mary Ellen Foster
46
3
0
23 Mar 2021
Learning spectro-temporal representations of complex sounds with
  parameterized neural networks
Learning spectro-temporal representations of complex sounds with parameterized neural networks
Rachid Riad
Julien Karadayi
Anne-Catherine Bachoud-Lévi
Emmanuel Dupoux
49
7
0
12 Mar 2021
Multi-Format Contrastive Learning of Audio Representations
Multi-Format Contrastive Learning of Audio Representations
Luyu Wang
Aaron van den Oord
95
59
0
11 Mar 2021
EmoNet: A Transfer Learning Framework for Multi-Corpus Speech Emotion
  Recognition
EmoNet: A Transfer Learning Framework for Multi-Corpus Speech Emotion Recognition
Maurice Gerczuk
Shahin Amiriparian
Sandra Ottl
Björn Schuller
95
59
0
10 Mar 2021
Perceiver: General Perception with Iterative Attention
Perceiver: General Perception with Iterative Attention
Andrew Jaegle
Felix Gimeno
Andrew Brock
Andrew Zisserman
Oriol Vinyals
João Carreira
VLMViTMDE
214
1,029
0
04 Mar 2021
Fast threshold optimization for multi-label audio tagging using
  Surrogate gradient learning
Fast threshold optimization for multi-label audio tagging using Surrogate gradient learning
Thomas Pellegrini
T. Masquelier
BDL
25
7
0
01 Mar 2021
Investigating Local and Global Information for Automated Audio
  Captioning with Transfer Learning
Investigating Local and Global Information for Automated Audio Captioning with Transfer Learning
Xuenan Xu
Heinrich Dinkel
Mengyue Wu
Zeyu Xie
Kai Yu
75
60
0
23 Feb 2021
Speech enhancement with weakly labelled data from AudioSet
Speech enhancement with weakly labelled data from AudioSet
Qiuqiang Kong
Haohe Liu
Xingjian Du
Li Chen
Rui Xia
Yuxuan Wang
82
18
0
19 Feb 2021
Enhancing Audio Augmentation Methods with Consistency Learning
Enhancing Audio Augmentation Methods with Consistency Learning
Turab Iqbal
Karim Helwani
A. Krishnaswamy
Wenwu Wang
58
5
0
09 Feb 2021
A Global-local Attention Framework for Weakly Labelled Audio Tagging
A Global-local Attention Framework for Weakly Labelled Audio Tagging
Helin Wang
Yuexian Zou
Wenwu Wang
43
6
0
03 Feb 2021
PSLA: Improving Audio Tagging with Pretraining, Sampling, Labeling, and
  Aggregation
PSLA: Improving Audio Tagging with Pretraining, Sampling, Labeling, and Aggregation
Yuan Gong
Yu-An Chung
James R. Glass
VLM
199
147
0
02 Feb 2021
VX2TEXT: End-to-End Learning of Video-Based Text Generation From
  Multimodal Inputs
VX2TEXT: End-to-End Learning of Video-Based Text Generation From Multimodal Inputs
Xudong Lin
Gedas Bertasius
Jue Wang
Shih-Fu Chang
Devi Parikh
Lorenzo Torresani
VGen
102
67
0
28 Jan 2021
LEAF: A Learnable Frontend for Audio Classification
LEAF: A Learnable Frontend for Audio Classification
Neil Zeghidour
O. Teboul
Félix de Chaumont Quitry
Marco Tagliasacchi
VLMAAML
131
148
0
21 Jan 2021
Towards duration robust weakly supervised sound event detection
Towards duration robust weakly supervised sound event detection
Heinrich Dinkel
Mengyue Wu
Kai Yu
54
49
0
19 Jan 2021
Leveraging Audio Gestalt to Predict Media Memorability
Leveraging Audio Gestalt to Predict Media Memorability
Lorin Sweeney
Graham Healy
Alan F. Smeaton
61
6
0
31 Dec 2020
Audio-Visual Event Recognition through the lens of Adversary
Audio-Visual Event Recognition through the lens of Adversary
Juncheng Li
Kaixin Ma
Shuhui Qu
Po-Yao (Bernie) Huang
Florian Metze
AAML
65
9
0
15 Nov 2020
Large-Scale MIDI-based Composer Classification
Large-Scale MIDI-based Composer Classification
Qiuqiang Kong
Keunwoo Choi
Yuxuan Wang
45
19
0
28 Oct 2020
Perceptual Loss based Speech Denoising with an ensemble of Audio Pattern
  Recognition and Self-Supervised Models
Perceptual Loss based Speech Denoising with an ensemble of Audio Pattern Recognition and Self-Supervised Models
Saurabh Kataria
Jesús Villalba
Najim Dehak
VLMSSL
68
34
0
22 Oct 2020
Urban Sound Classification : striving towards a fair comparison
Urban Sound Classification : striving towards a fair comparison
Augustin Arnault
Baptiste Hanssens
Nicolas Riche
53
9
0
22 Oct 2020
GiantMIDI-Piano: A large-scale MIDI dataset for classical piano music
GiantMIDI-Piano: A large-scale MIDI dataset for classical piano music
Qiuqiang Kong
Bochen Li
Jitong Chen
Yuxuan Wang
394
81
0
11 Oct 2020
An Audio-Video Deep and Transfer Learning Framework for Multimodal
  Emotion Recognition in the wild
An Audio-Video Deep and Transfer Learning Framework for Multimodal Emotion Recognition in the wild
D. Dresvyanskiy
E. Ryumina
Heysem Kaya
M. Markitantov
A. Karpov
Wolfgang Minker
CVBM
83
17
0
07 Oct 2020
High-resolution Piano Transcription with Pedals by Regressing Onset and
  Offset Times
High-resolution Piano Transcription with Pedals by Regressing Onset and Offset Times
Qiuqiang Kong
Bochen Li
Xuchen Song
Yuan Wan
Yuxuan Wang
403
112
0
05 Oct 2020
FSD50K: An Open Dataset of Human-Labeled Sound Events
FSD50K: An Open Dataset of Human-Labeled Sound Events
Eduardo Fonseca
Xavier Favory
Jordi Pons
F. Font
Xavier Serra
142
467
0
01 Oct 2020
CRNNs for Urban Sound Tagging with spatiotemporal context
CRNNs for Urban Sound Tagging with spatiotemporal context
Augustin Arnault
Nicolas Riche
58
7
0
24 Aug 2020
Acoustic Scene Classification with Spectrogram Processing Strategies
Acoustic Scene Classification with Spectrogram Processing Strategies
Helin Wang
Yuexian Zou
Dading Chong
56
13
0
06 Jul 2020
Self-Supervised MultiModal Versatile Networks
Self-Supervised MultiModal Versatile Networks
Jean-Baptiste Alayrac
Adrià Recasens
R. Schneider
Relja Arandjelović
Jason Ramapuram
J. Fauw
Lucas Smaira
Sander Dieleman
Andrew Zisserman
SSL
190
375
0
29 Jun 2020
Addressing Missing Labels in Large-Scale Sound Event Recognition Using a
  Teacher-Student Framework With Loss Masking
Addressing Missing Labels in Large-Scale Sound Event Recognition Using a Teacher-Student Framework With Loss Masking
Eduardo Fonseca
Shawn Hershey
Manoj Plakal
D. Ellis
A. Jansen
R. C. Moore
Xavier Serra
NoLa
98
23
0
02 May 2020
Voice activity detection in the wild via weakly supervised sound event
  detection
Voice activity detection in the wild via weakly supervised sound event detection
Heinrich Dinkel
Yefei Chen
Mengyue Wu
Kai Yu
38
2
0
27 Mar 2020
Towards Learning a Universal Non-Semantic Representation of Speech
Towards Learning a Universal Non-Semantic Representation of Speech
Joel Shor
A. Jansen
Ronnie Maor
Oran Lang
Omry Tuval
Félix de Chaumont Quitry
Marco Tagliasacchi
Ira Shavitt
Dotan Emanuel
Yinnon A. Haviv
SSL
142
160
0
25 Feb 2020
Source separation with weakly labelled data: An approach to
  computational auditory scene analysis
Source separation with weakly labelled data: An approach to computational auditory scene analysis
Qiuqiang Kong
Yuxuan Wang
Xuchen Song
Yin Cao
Wenwu Wang
Mark D. Plumbley
94
47
0
06 Feb 2020
Previous
123...10119