ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1912.10211
  4. Cited By
PANNs: Large-Scale Pretrained Audio Neural Networks for Audio Pattern
  Recognition
v1v2v3v4v5 (latest)

PANNs: Large-Scale Pretrained Audio Neural Networks for Audio Pattern Recognition

21 December 2019
Qiuqiang Kong
Yin Cao
Turab Iqbal
Yuxuan Wang
Wenwu Wang
Mark D. Plumbley
    VLMSSL
ArXiv (abs)PDFHTMLGithub (1475★)

Papers citing "PANNs: Large-Scale Pretrained Audio Neural Networks for Audio Pattern Recognition"

50 / 545 papers shown
Title
Language-Based Audio Retrieval with Converging Tied Layers and
  Contrastive Loss
Language-Based Audio Retrieval with Converging Tied Layers and Contrastive Loss
Andrew Koh
Chng Eng Siong
144
1
0
29 Jun 2022
QbyE-MLPMixer: Query-by-Example Open-Vocabulary Keyword Spotting using
  MLPMixer
QbyE-MLPMixer: Query-by-Example Open-Vocabulary Keyword Spotting using MLPMixer
Jinmiao Huang
W. Gharbieh
Qianhui Wan
Han Suk Shim
Chul Lee
57
10
0
23 Jun 2022
Few-shot Long-Tailed Bird Audio Recognition
Few-shot Long-Tailed Bird Audio Recognition
Marcos V. Conde
Ui-Jin Choi
43
8
0
22 Jun 2022
Probing Visual-Audio Representation for Video Highlight Detection via
  Hard-Pairs Guided Contrastive Learning
Probing Visual-Audio Representation for Video Highlight Detection via Hard-Pairs Guided Contrastive Learning
Shuaicheng Li
Feng Zhang
Kunlin Yang
Lin-Na Liu
Shinan Liu
Jun Hou
Shuai Yi
100
9
0
21 Jun 2022
Redundancy Reduction Twins Network: A Training framework for
  Multi-output Emotion Regression
Redundancy Reduction Twins Network: A Training framework for Multi-output Emotion Regression
Xin Jing
Meishu Song
Andreas Triantafyllopoulos
Zijiang Yang
Björn W. Schuller
32
8
0
18 Jun 2022
It's Time for Artistic Correspondence in Music and Video
It's Time for Artistic Correspondence in Music and Video
Dídac Surís
Carl Vondrick
Bryan C. Russell
Justin Salamon
64
37
0
14 Jun 2022
Exploring speaker enrolment for few-shot personalisation in emotional
  vocalisation prediction
Exploring speaker enrolment for few-shot personalisation in emotional vocalisation prediction
Andreas Triantafyllopoulos
Meishu Song
Zijiang Yang
Xin Jing
Björn W. Schuller
45
9
0
14 Jun 2022
PaddleSpeech: An Easy-to-Use All-in-One Speech Toolkit
PaddleSpeech: An Easy-to-Use All-in-One Speech Toolkit
Hui Zhang
Tian Yuan
Junkun Chen
Xintong Li
Renjie Zheng
...
Zeyu Chen
Xiaoguang Hu
Dianhai Yu
Yanjun Ma
Liang Huang
AuLLM
69
28
0
20 May 2022
The AI Mechanic: Acoustic Vehicle Characterization Neural Networks
The AI Mechanic: Acoustic Vehicle Characterization Neural Networks
Adam M. Terwilliger
J. Siegel
62
2
0
19 May 2022
Composing General Audio Representation by Fusing Multilayer Features of
  a Pre-trained Model
Composing General Audio Representation by Fusing Multilayer Features of a Pre-trained Model
Daisuke Niizumi
Daiki Takeuchi
Yasunori Ohishi
Noboru Harada
K. Kashino
69
6
0
17 May 2022
Noise-Tolerant Learning for Audio-Visual Action Recognition
Noise-Tolerant Learning for Audio-Visual Action Recognition
Haocheng Han
Qinghua Zheng
Minnan Luo
Kaiyao Miao
Feng Tian
Yuanchun Chen
NoLa
100
9
0
16 May 2022
Learning Representations for New Sound Classes With Continual
  Self-Supervised Learning
Learning Representations for New Sound Classes With Continual Self-Supervised Learning
Zhepei Wang
Cem Subakan
Xilin Jiang
Junkai Wu
Efthymios Tzinis
Mirco Ravanelli
Paris Smaragdis
CLLSSL
123
19
0
15 May 2022
Automated Audio Captioning: An Overview of Recent Progress and New
  Challenges
Automated Audio Captioning: An Overview of Recent Progress and New Challenges
Xinhao Mei
Xubo Liu
Mark D. Plumbley
Wenwu Wang
104
44
0
12 May 2022
Beyond the Status Quo: A Contemporary Survey of Advances and Challenges
  in Audio Captioning
Beyond the Status Quo: A Contemporary Survey of Advances and Challenges in Audio Captioning
Xuenan Xu
Zeyu Xie
Mengyue Wu
K. Yu
84
16
0
11 May 2022
Fatigue Prediction in Outdoor Running Conditions using Audio Data
Fatigue Prediction in Outdoor Running Conditions using Audio Data
Andreas Triantafyllopoulos
Sandra Ottl
Alexander Gebhard
Esther Rituerto-González
Mirko Jaumann
...
P. Schneeweiss
I. Krauss
Maurice Gerczuk
Shahin Amiriparian
Björn W. Schuller
55
9
0
09 May 2022
Vocalsound: A Dataset for Improving Human Vocal Sounds Recognition
Vocalsound: A Dataset for Improving Human Vocal Sounds Recognition
Yuan Gong
Jingbo Yu
James R. Glass
89
42
0
06 May 2022
Robustness of Neural Architectures for Audio Event Detection
Robustness of Neural Architectures for Audio Event Detection
Juncheng Billy Li
Zheng Wang
Shuhui Qu
Florian Metze
28
1
0
06 May 2022
Relation-guided acoustic scene classification aided with event
  embeddings
Relation-guided acoustic scene classification aided with event embeddings
Yuanbo Hou
Bo Kang
Wout Van Hauwermeiren
Dick Botteldooren
52
16
0
01 May 2022
Autonomous In-Situ Soundscape Augmentation via Joint Selection of Masker
  and Gain
Autonomous In-Situ Soundscape Augmentation via Joint Selection of Masker and Gain
Karn N. Watcharasupat
Kenneth Ooi
Bhan Lam
Trevor Wong
Zhen-Ting Ong
W. Gan
52
8
0
29 Apr 2022
Pseudo strong labels for large scale weakly supervised audio tagging
Pseudo strong labels for large scale weakly supervised audio tagging
Heinrich Dinkel
Zhiyong Yan
Yongqing Wang
Junbo Zhang
Yujun Wang
61
6
0
28 Apr 2022
Masked Spectrogram Prediction For Self-Supervised Audio Pre-Training
Masked Spectrogram Prediction For Self-Supervised Audio Pre-Training
Dading Chong
Helin Wang
Peilin Zhou
Qingcheng Zeng
79
68
0
27 Apr 2022
Masked Spectrogram Modeling using Masked Autoencoders for Learning
  General-purpose Audio Representation
Masked Spectrogram Modeling using Masked Autoencoders for Learning General-purpose Audio Representation
Daisuke Niizumi
Daiki Takeuchi
Yasunori Ohishi
Noboru Harada
K. Kashino
94
69
0
26 Apr 2022
ATST: Audio Representation Learning with Teacher-Student Transformer
ATST: Audio Representation Learning with Teacher-Student Transformer
Xian Li
Xiaofei Li
ViT
58
22
0
26 Apr 2022
Caption Feature Space Regularization for Audio Captioning
Caption Feature Space Regularization for Audio Captioning
Yiming Zhang
Hong Yu
Ruoyi Du
Zhanyu Ma
Yuan Dong
122
1
0
18 Apr 2022
BYOL for Audio: Exploring Pre-trained General-purpose Audio
  Representations
BYOL for Audio: Exploring Pre-trained General-purpose Audio Representations
Daisuke Niizumi
Daiki Takeuchi
Yasunori Ohishi
Noboru Harada
K. Kashino
SSL
100
59
0
15 Apr 2022
On the pragmatism of using binary classifiers over data intensive neural
  network classifiers for detection of COVID-19 from voice
On the pragmatism of using binary classifiers over data intensive neural network classifiers for detection of COVID-19 from voice
Ankit Parag Shah
Hira Dhamyal
Yang Gao
Daniel Arancibia
Mario Arancibia
Bhiksha Raj
Rita Singh
79
5
0
11 Apr 2022
SoundBeam: Target sound extraction conditioned on sound-class labels and
  enrollment clues for increased performance and continuous learning
SoundBeam: Target sound extraction conditioned on sound-class labels and enrollment clues for increased performance and continuous learning
Marc Delcroix
Jorge Bennasar Vázquez
Tsubasa Ochiai
K. Kinoshita
Yasunori Ohishi
S. Araki
VLM
83
34
0
08 Apr 2022
RaDur: A Reference-aware and Duration-robust Network for Target Sound
  Detection
RaDur: A Reference-aware and Duration-robust Network for Target Sound Detection
Dongchao Yang
Helin Wang
Zhongjie Ye
Yuexian Zou
Wenwu Wang
57
0
0
05 Apr 2022
A Mixed supervised Learning Framework for Target Sound Detection
A Mixed supervised Learning Framework for Target Sound Detection
Dongchao Yang
Helin Wang
Yuexian Zou
Wenwu Wang
49
0
0
05 Apr 2022
Improving Target Sound Extraction with Timestamp Information
Improving Target Sound Extraction with Timestamp Information
Helin Wang
Dongchao Yang
Chao Weng
Jianwei Yu
Yuexian Zou
64
10
0
02 Apr 2022
A Temporal-oriented Broadcast ResNet for COVID-19 Detection
A Temporal-oriented Broadcast ResNet for COVID-19 Detection
Xin Jing
Shuo Liu
Emilia Parada-Cabaleiro
Andreas Triantafyllopoulos
Meishu Song
Zijiang Yang
Björn W. Schuller
84
2
0
31 Mar 2022
WavThruVec: Latent speech representation as intermediate features for
  neural speech synthesis
WavThruVec: Latent speech representation as intermediate features for neural speech synthesis
Hubert Siuzdak
Piotr Dura
Pol van Rijn
Nori Jacoby
AI4TS
131
30
0
31 Mar 2022
A Passive Similarity based CNN Filter Pruning for Efficient Acoustic
  Scene Classification
A Passive Similarity based CNN Filter Pruning for Efficient Acoustic Scene Classification
Arshdeep Singh
Mark D. Plumbley
3DPC
50
14
0
29 Mar 2022
Interactive Audio-text Representation for Automated Audio Captioning
  with Contrastive Learning
Interactive Audio-text Representation for Automated Audio Captioning with Contrastive Learning
Chen Chen
Nana Hou
Yuchen Hu
Heqing Zou
Xiaofeng Qi
Chng Eng Siong
VLM
84
21
0
29 Mar 2022
Audio-text Retrieval in Context
Audio-text Retrieval in Context
Siyu Lou
Xuenan Xu
Mengyue Wu
K. Yu
75
30
0
25 Mar 2022
AudioTagging Done Right: 2nd comparison of deep learning methods for
  environmental sound classification
AudioTagging Done Right: 2nd comparison of deep learning methods for environmental sound classification
Juncheng Billy Li
Shuhui Qu
Po-Yao (Bernie) Huang
Florian Metze
VLM
100
9
0
25 Mar 2022
Movie Genre Classification by Language Augmentation and Shot Sampling
Movie Genre Classification by Language Augmentation and Shot Sampling
Zhongping Zhang
Yiwen Gu
Bryan A. Plummer
Xin Miao
Jiayi Liu
Huayan Wang
VLMCLIP
61
1
0
24 Mar 2022
UMT: Unified Multi-modal Transformers for Joint Video Moment Retrieval
  and Highlight Detection
UMT: Unified Multi-modal Transformers for Joint Video Moment Retrieval and Highlight Detection
Ye Liu
Siyuan Li
Yang Wu
C. Chen
Ying Shan
Xiaohu Qie
ViT
104
151
0
23 Mar 2022
On Adversarial Robustness of Large-scale Audio Visual Learning
On Adversarial Robustness of Large-scale Audio Visual Learning
Juncheng Billy Li
Shuhui Qu
Xinjian Li
Po-Yao (Bernie) Huang
Florian Metze
AAML
37
7
0
23 Mar 2022
Learning Audio Representations with MLPs
Learning Audio Representations with MLPs
Mashrur M. Morshed
Ahmad Omar Ahsan
H. Mahmud
Md. Kamrul Hasan
72
4
0
16 Mar 2022
A Squeeze-and-Excitation and Transformer based Cross-task System for
  Environmental Sound Recognition
A Squeeze-and-Excitation and Transformer based Cross-task System for Environmental Sound Recognition
Jisheng Bai
Jianfeng Chen
Mou Wang
Muhammad Saad Ayub
56
9
0
16 Mar 2022
Dawn of the transformer era in speech emotion recognition: closing the
  valence gap
Dawn of the transformer era in speech emotion recognition: closing the valence gap
Johannes Wagner
Andreas Triantafyllopoulos
H. Wierstorf
Maximilian Schmitt
Felix Burkhardt
F. Eyben
Björn W. Schuller
96
306
0
14 Mar 2022
CMKD: CNN/Transformer-Based Cross-Model Knowledge Distillation for Audio
  Classification
CMKD: CNN/Transformer-Based Cross-Model Knowledge Distillation for Audio Classification
Yuan Gong
Sameer Khurana
Andrew Rouditchenko
James R. Glass
VLM
73
29
0
13 Mar 2022
A study on joint modeling and data augmentation of multi-modalities for
  audio-visual scene classification
A study on joint modeling and data augmentation of multi-modalities for audio-visual scene classification
Qing Wang
Jun Du
Siyuan Zheng
Yunqing Li
Yajian Wang
...
Hu Hu
Chao-Han Huck Yang
Sabato Marco Siniscalchi
Yannan Wang
Chin-Hui Lee
45
2
0
07 Mar 2022
HEAR: Holistic Evaluation of Audio Representations
HEAR: Holistic Evaluation of Audio Representations
Joseph P. Turian
Jordie Shier
H. Khan
Bhiksha Raj
Björn W. Schuller
...
P. Esling
Pranay Manocha
Shinji Watanabe
Zeyu Jin
Yonatan Bisk
135
108
0
06 Mar 2022
Leveraging Pre-trained BERT for Audio Captioning
Leveraging Pre-trained BERT for Audio Captioning
Xubo Liu
Xinhao Mei
Qiushi Huang
Jianyuan Sun
Jinzheng Zhao
Haohe Liu
Mark D. Plumbley
Volkan Kilicc
Wenwu Wang
115
30
0
06 Mar 2022
A Summary of the ComParE COVID-19 Challenges
A Summary of the ComParE COVID-19 Challenges
H. Coppock
Ali Akman
Christian Bergler
Maurice Gerczuk
Chloë Brown
...
Sandra Ottl
Panagiotis Tzirakis
A. Batliner
Cecilia Mascolo
Björn W. Schuller
70
9
0
17 Feb 2022
Audio-Based Deep Learning Frameworks for Detecting COVID-19
Audio-Based Deep Learning Frameworks for Detecting COVID-19
Dat Ngo
L. D. Pham
Hoang Van Truong
Ş. Kolozali
D. Jarchi
83
5
0
10 Feb 2022
Maximizing Audio Event Detection Model Performance on Small Datasets
  Through Knowledge Transfer, Data Augmentation, And Pretraining: An Ablation
  Study
Maximizing Audio Event Detection Model Performance on Small Datasets Through Knowledge Transfer, Data Augmentation, And Pretraining: An Ablation Study
Daniel C. Tompkins
Kshitiz Kumar
Jian Wu
46
5
0
07 Feb 2022
Learning strides in convolutional neural networks
Learning strides in convolutional neural networks
Rachid Riad
O. Teboul
David Grangier
Neil Zeghidour
82
42
0
03 Feb 2022
Previous
123...101189
Next