ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1912.10211
  4. Cited By
PANNs: Large-Scale Pretrained Audio Neural Networks for Audio Pattern
  Recognition
v1v2v3v4v5 (latest)

PANNs: Large-Scale Pretrained Audio Neural Networks for Audio Pattern Recognition

21 December 2019
Qiuqiang Kong
Yin Cao
Turab Iqbal
Yuxuan Wang
Wenwu Wang
Mark D. Plumbley
    VLMSSL
ArXiv (abs)PDFHTMLGithub (1475★)

Papers citing "PANNs: Large-Scale Pretrained Audio Neural Networks for Audio Pattern Recognition"

50 / 545 papers shown
Title
Joint Prediction of Audio Event and Annoyance Rating in an Urban
  Soundscape by Hierarchical Graph Representation Learning
Joint Prediction of Audio Event and Annoyance Rating in an Urban Soundscape by Hierarchical Graph Representation Learning
Yuanbo Hou
Siyang Song
Cheng Luo
A. Mitchell
Qiaoqiao Ren
Weicheng Xie
Jian Kang
Wenwu Wang
Dick Botteldooren
71
6
0
23 Aug 2023
CED: Consistent ensemble distillation for audio tagging
CED: Consistent ensemble distillation for audio tagging
Heinrich Dinkel
Yongqing Wang
Zhiyong Yan
Junbo Zhang
Yujun Wang
76
24
0
23 Aug 2023
Audio Generation with Multiple Conditional Diffusion Model
Audio Generation with Multiple Conditional Diffusion Model
Zhifang Guo
Jianguo Mao
Ruijie Tao
Long Yan
Kazushige Ouchi
Hong Liu
Xiangdong Wang
DiffM
97
14
0
23 Aug 2023
Audio Difference Captioning Utilizing Similarity-Discrepancy
  Disentanglement
Audio Difference Captioning Utilizing Similarity-Discrepancy Disentanglement
Daiki Takeuchi
Yasunori Ohishi
Daisuke Niizumi
Noboru Harada
K. Kashino
78
7
0
23 Aug 2023
Example-Based Framework for Perceptually Guided Audio Texture Generation
Example-Based Framework for Perceptually Guided Audio Texture Generation
Purnima Kamath
Chitralekha Gupta
L. Wyse
Suranga Nanayakkara
48
4
0
23 Aug 2023
MEGA: Multimodal Alignment Aggregation and Distillation For Cinematic
  Video Segmentation
MEGA: Multimodal Alignment Aggregation and Distillation For Cinematic Video Segmentation
Najmeh Sadoughi
Xinyu Li
Avijit Vajpayee
D. Fan
Bing Shuai
H. Santos-Villalobos
Vimal Bhat
M. Rohith
75
4
0
22 Aug 2023
V2A-Mapper: A Lightweight Solution for Vision-to-Audio Generation by
  Connecting Foundation Models
V2A-Mapper: A Lightweight Solution for Vision-to-Audio Generation by Connecting Foundation Models
Heng Wang
Jianbo Ma
Santiago Pascual
Richard Cartwright
Weidong (Tom) Cai
VGen
110
43
0
18 Aug 2023
Bridging High-Quality Audio and Video via Language for Sound Effects
  Retrieval from Visual Queries
Bridging High-Quality Audio and Video via Language for Sound Effects Retrieval from Visual Queries
J. Wilkins
Justin Salamon
Magdalena Fuentes
J. P. Bello
Oriol Nieto
CLIP
55
5
0
17 Aug 2023
META-SELD: Meta-Learning for Fast Adaptation to the new environment in
  Sound Event Localization and Detection
META-SELD: Meta-Learning for Fast Adaptation to the new environment in Sound Event Localization and Detection
Jinbo Hu
Yin Cao
Ming Wu
Feiran Yang
Ziying Yu
Wenwu Wang
Mark D. Plumbley
J. Yang
VLM
74
6
0
17 Aug 2023
AudioFormer: Audio Transformer learns audio feature representations from discrete acoustic codes
Zhaohui Li
Haitao Wang
Xinghua Jiang
113
1
0
14 Aug 2023
The Sound Demixing Challenge 2023 $\unicode{x2013}$ Music Demixing Track
The Sound Demixing Challenge 2023 \unicodex2013\unicode{x2013}\unicodex2013 Music Demixing Track
Giorgio Fabbro
Stefan Uhlich
Chieh-Hsin Lai
Woosung Choi
Marco A. Martínez-Ramírez
...
Jun Hyung Lee
Yuanliang Dong
Xinran Zhang
Jiafeng Liu
Yuki Mitsufuji
124
23
0
14 Aug 2023
Separate Anything You Describe
Separate Anything You Describe
Xubo Liu
Qiuqiang Kong
Yan Zhao
Haohe Liu
Yiitan Yuan
Yuzhuo Liu
Rui Xia
Yuxuan Wang
Mark D. Plumbley
Wenwu Wang
VLM
105
52
0
09 Aug 2023
Representation Learning for Audio Privacy Preservation using Source
  Separation and Robust Adversarial Learning
Representation Learning for Audio Privacy Preservation using Source Separation and Robust Adversarial Learning
Diep Luong
Minh Tran
Shayan Gharib
Konstantinos Drossos
Tuomas Virtanen
46
4
0
09 Aug 2023
Induction Network: Audio-Visual Modality Gap-Bridging for
  Self-Supervised Sound Source Localization
Induction Network: Audio-Visual Modality Gap-Bridging for Self-Supervised Sound Source Localization
Tianyu Liu
Peng Zhang
Wei Huang
Yufei Zha
Tao You
Yanni Zhang
SSL
67
2
0
09 Aug 2023
Advancing Natural-Language Based Audio Retrieval with PaSST and Large
  Audio-Caption Data Sets
Advancing Natural-Language Based Audio Retrieval with PaSST and Large Audio-Caption Data Sets
Paul Primus
Khaled Koutini
Gerhard Widmer
71
13
0
08 Aug 2023
Finding Tori: Self-supervised Learning for Analyzing Korean Folk Song
Finding Tori: Self-supervised Learning for Analyzing Korean Folk Song
Danbinaerin Han
Rafael Caro Repetto
Dasaem Jeong
58
4
0
04 Aug 2023
MusicLDM: Enhancing Novelty in Text-to-Music Generation Using
  Beat-Synchronous Mixup Strategies
MusicLDM: Enhancing Novelty in Text-to-Music Generation Using Beat-Synchronous Mixup Strategies
Kai Chen
Yusong Wu
Haohe Liu
Marianna Nezhurina
Taylor Berg-Kirkpatrick
Shlomo Dubnov
DiffM
94
81
0
03 Aug 2023
UnIVAL: Unified Model for Image, Video, Audio and Language Tasks
UnIVAL: Unified Model for Image, Video, Audio and Language Tasks
Mustafa Shukor
Corentin Dancette
Alexandre Ramé
Matthieu Cord
MoMeMLLM
124
46
0
30 Jul 2023
Improving Audio-Text Retrieval via Hierarchical Cross-Modal Interaction and Auxiliary Captions
Improving Audio-Text Retrieval via Hierarchical Cross-Modal Interaction and Auxiliary Captions
Yifei Xin
Yuexian Zou
121
9
0
28 Jul 2023
PEANUT: A Human-AI Collaborative Tool for Annotating Audio-Visual Data
PEANUT: A Human-AI Collaborative Tool for Annotating Audio-Visual Data
Zheng Zhang
Zheng Ning
Chenliang Xu
Yapeng Tian
Toby Jia-Jun Li
88
7
0
27 Jul 2023
Complete and separate: Conditional separation with missing target source
  attribute completion
Complete and separate: Conditional separation with missing target source attribute completion
Dimitrios Bralios
Efthymios Tzinis
Paris Smaragdis
87
0
0
27 Jul 2023
On the Effectiveness of Speech Self-supervised Learning for Music
On the Effectiveness of Speech Self-supervised Learning for Music
Yi Ma
Ruibin Yuan
Yizhi Li
Ge Zhang
Xingran Chen
...
Ruibo Liu
Gus Xia
Roger Dannenberg
Yi-Ting Guo
Jie Fu
65
10
0
11 Jul 2023
Whisper-AT: Noise-Robust Automatic Speech Recognizers are Also Strong
  General Audio Event Taggers
Whisper-AT: Noise-Robust Automatic Speech Recognizers are Also Strong General Audio Event Taggers
Yuan Gong
Sameer Khurana
Leonid Karlinsky
James R. Glass
88
71
0
06 Jul 2023
MomentDiff: Generative Video Moment Retrieval from Random to Real
MomentDiff: Generative Video Moment Retrieval from Random to Real
P. Li
Chen-Wei Xie
Hongtao Xie
Liming Zhao
Lei Zhang
Yun Zheng
Deli Zhao
Yongdong Zhang
DiffMVGen
111
60
0
06 Jul 2023
Dataset balancing can hurt model performance
Dataset balancing can hurt model performance
R. C. Moore
D. Ellis
Eduardo Fonseca
Shawn Hershey
A. Jansen
Manoj Plakal
80
9
0
30 Jun 2023
Audio Embeddings as Teachers for Music Classification
Audio Embeddings as Teachers for Music Classification
Yiwei Ding
Alexander Lerch
58
5
0
30 Jun 2023
LyricWhiz: Robust Multilingual Zero-shot Lyrics Transcription by
  Whispering to ChatGPT
LyricWhiz: Robust Multilingual Zero-shot Lyrics Transcription by Whispering to ChatGPT
Le Zhuo
Ruibin Yuan
Jiahao Pan
Yi Ma
Yizhi Li
...
Chenghua Lin
Emmanouil Benetos
Wenhu Chen
Wei Xue
Yi-Ting Guo
100
18
0
29 Jun 2023
Diff-Foley: Synchronized Video-to-Audio Synthesis with Latent Diffusion
  Models
Diff-Foley: Synchronized Video-to-Audio Synthesis with Latent Diffusion Models
Simian Luo
Chuanhao Yan
Chenxu Hu
Hang Zhao
DiffM
105
83
0
29 Jun 2023
Toward Leveraging Pre-Trained Self-Supervised Frontends for Automatic
  Singing Voice Understanding Tasks: Three Case Studies
Toward Leveraging Pre-Trained Self-Supervised Frontends for Automatic Singing Voice Understanding Tasks: Three Case Studies
Yuya Yamamoto
38
2
0
22 Jun 2023
Female mosquito detection by means of AI techniques inside release
  containers in the context of a Sterile Insect Technique program
Female mosquito detection by means of AI techniques inside release containers in the context of a Sterile Insect Technique program
Javier Naranjo-Alcazar
Jordi Grau-Haro
D. Almenar
P. Zuccarello
46
0
0
19 Jun 2023
MARBLE: Music Audio Representation Benchmark for Universal Evaluation
MARBLE: Music Audio Representation Benchmark for Universal Evaluation
Ruibin Yuan
Yi Ma
Yizhi Li
Ge Zhang
Xingran Chen
...
Si Liu
Shi Wang
Ruibo Liu
Yi-Ting Guo
Jie Fu
155
34
0
18 Jun 2023
Crowdsourcing and Evaluating Text-Based Audio Retrieval Relevances
Crowdsourcing and Evaluating Text-Based Audio Retrieval Relevances
Huang Xie
Khazar Khorrami
Okko Räsänen
Tuomas Virtanen
47
4
0
16 Jun 2023
Audio Tagging on an Embedded Hardware Platform
Audio Tagging on an Embedded Hardware Platform
Gabriel Bibbó
Arshdeep Singh
Mark D. Plumbley
15
0
0
15 Jun 2023
Enhanced Multimodal Representation Learning with Cross-modal KD
Enhanced Multimodal Representation Learning with Cross-modal KD
Mengxi Chen
Linyu Xing
Yu Wang
Ya Zhang
60
12
0
13 Jun 2023
Self-supervised Audio Teacher-Student Transformer for Both Clip-level
  and Frame-level Tasks
Self-supervised Audio Teacher-Student Transformer for Both Clip-level and Frame-level Tasks
Xian Li
Nian Shao
Xiaofei Li
ViTCLIP
103
28
0
07 Jun 2023
Enhance Temporal Relations in Audio Captioning with Sound Event
  Detection
Enhance Temporal Relations in Audio Captioning with Sound Event Detection
Zeyu Xie
Xuenan Xu
Mengyue Wu
K. Yu
77
10
0
02 Jun 2023
Blind Audio Bandwidth Extension: A Diffusion-Based Zero-Shot Approach
Blind Audio Bandwidth Extension: A Diffusion-Based Zero-Shot Approach
Eloi Moliner
Filip Elvander
Vesa Valimaki
DiffM
99
12
0
02 Jun 2023
Adapting a ConvNeXt model to audio classification on AudioSet
Adapting a ConvNeXt model to audio classification on AudioSet
Thomas Pellegrini
Ismail Khalfaoui-Hassani
Etienne Labbé
T. Masquelier
90
23
0
01 Jun 2023
Understanding temporally weakly supervised training: A case study for
  keyword spotting
Understanding temporally weakly supervised training: A case study for keyword spotting
Heinrich Dinkel
Weiji Zhuang
Zhiyong Yan
Yongqing Wang
Junbo Zhang
Yujun Wang
112
0
0
30 May 2023
E-PANNs: Sound Recognition Using Efficient Pre-trained Audio Neural
  Networks
E-PANNs: Sound Recognition Using Efficient Pre-trained Audio Neural Networks
Arshdeep Singh
Haohe Liu
Mark D. Plumbley
VLM
58
5
0
30 May 2023
Make-An-Audio 2: Temporal-Enhanced Text-to-Audio Generation
Make-An-Audio 2: Temporal-Enhanced Text-to-Audio Generation
Jia-Bin Huang
Yi Ren
Rongjie Huang
Dongchao Yang
Zhenhui Ye
Chen Zhang
Jinglin Liu
Xiang Yin
Zejun Ma
Zhou Zhao
DiffM
118
64
0
29 May 2023
DeCoR: Defy Knowledge Forgetting by Predicting Earlier Audio Codes
DeCoR: Defy Knowledge Forgetting by Predicting Earlier Audio Codes
Xilin Jiang
Yinghao Aaron Li
N. Mesgarani
CLL
45
1
0
29 May 2023
Streaming Audio Transformers for Online Audio Tagging
Streaming Audio Transformers for Online Audio Tagging
Heinrich Dinkel
Zhiyong Yan
Yongqing Wang
Junbo Zhang
Yujun Wang
Bin Wang
83
4
0
29 May 2023
Efficient Neural Music Generation
Efficient Neural Music Generation
Max W. Y. Lam
Qiao Tian
Tang-Chun Li
Zongyu Yin
Siyuan Feng
...
Mingbo Ma
Xuchen Song
Jitong Chen
Yuping Wang
Yuxuan Wang
DiffMMGen
95
56
0
25 May 2023
Real-Time Idling Vehicles Detection using Combined Audio-Visual Deep
  Learning
Real-Time Idling Vehicles Detection using Combined Audio-Visual Deep Learning
Xiwen Li
Tristalee Mangin
Surojit Saha
Evan K. Blanchard
Di Tang
Henry Poppe
Nathan Searle
Ouk Choi
Kerry E Kelly
Ross T. Whitaker
53
6
0
23 May 2023
U-DiT TTS: U-Diffusion Vision Transformer for Text-to-Speech
U-DiT TTS: U-Diffusion Vision Transformer for Text-to-Speech
Xin Jing
Yi Chang
Zijiang Yang
Jiang-jian Xie
Andreas Triantafyllopoulos
Bjoern W. Schuller
99
10
0
22 May 2023
LEAN: Light and Efficient Audio Classification Network
LEAN: Light and Efficient Audio Classification Network
Shwetank Choudhary
C. Karthik
Punuru Sri Lakshmi
Sumit Kumar
AI4TS
58
5
0
22 May 2023
ONE-PEACE: Exploring One General Representation Model Toward Unlimited
  Modalities
ONE-PEACE: Exploring One General Representation Model Toward Unlimited Modalities
Peng Wang
Shijie Wang
Junyang Lin
Shuai Bai
Xiaohuan Zhou
Jingren Zhou
Xinggang Wang
Chang Zhou
VLMMLLMObjD
151
122
0
18 May 2023
Listen, Think, and Understand
Listen, Think, and Understand
Yuan Gong
Hongyin Luo
Alexander H. Liu
Leonid Karlinsky
James R. Glass
ELMMLLMLRM
130
161
0
18 May 2023
Universal Source Separation with Weakly Labelled Data
Universal Source Separation with Weakly Labelled Data
Qiuqiang Kong
Kai Chen
Haohe Liu
Xingjian Du
Taylor Berg-Kirkpatrick
Shlomo Dubnov
Mark D. Plumbley
82
22
0
11 May 2023
Previous
123...567...91011
Next