ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1912.10211
  4. Cited By
PANNs: Large-Scale Pretrained Audio Neural Networks for Audio Pattern
  Recognition
v1v2v3v4v5 (latest)

PANNs: Large-Scale Pretrained Audio Neural Networks for Audio Pattern Recognition

21 December 2019
Qiuqiang Kong
Yin Cao
Turab Iqbal
Yuxuan Wang
Wenwu Wang
Mark D. Plumbley
    VLMSSL
ArXiv (abs)PDFHTMLGithub (1475★)

Papers citing "PANNs: Large-Scale Pretrained Audio Neural Networks for Audio Pattern Recognition"

50 / 545 papers shown
Title
Wav2Small: Distilling Wav2Vec2 to 72K parameters for Low-Resource Speech
  emotion recognition
Wav2Small: Distilling Wav2Vec2 to 72K parameters for Low-Resource Speech emotion recognition
Dionyssos Kounadis-Bastian
Oliver Schrufer
Anna Derington
H. Wierstorf
F. Eyben
Felix Burkhardt
Björn Schuller
90
1
0
25 Aug 2024
On Class Separability Pitfalls In Audio-Text Contrastive Zero-Shot
  Learning
On Class Separability Pitfalls In Audio-Text Contrastive Zero-Shot Learning
Tiago Tavares
Fabio Ayres
Zhepei Wang
Paris Smaragdis
VLM
50
2
0
23 Aug 2024
QD-VMR: Query Debiasing with Contextual Understanding Enhancement for
  Video Moment Retrieval
QD-VMR: Query Debiasing with Contextual Understanding Enhancement for Video Moment Retrieval
Chenghua Gao
Min Li
Jianshuo Liu
Junxing Ren
Lin Chen
Haoyu Liu
Bo Meng
Jitao Fu
Wenwen Su
50
0
0
23 Aug 2024
D&M: Enriching E-commerce Videos with Sound Effects by Key Moment Detection and SFX Matching
D&M: Enriching E-commerce Videos with Sound Effects by Key Moment Detection and SFX Matching
Jingyu Liu
Minquan Wang
Ye Ma
Bo Wang
Aozhu Chen
Quan Chen
Peng Jiang
Xirong Li
131
1
0
23 Aug 2024
Video-Foley: Two-Stage Video-To-Sound Generation via Temporal Event Condition For Foley Sound
Video-Foley: Two-Stage Video-To-Sound Generation via Temporal Event Condition For Foley Sound
Junwon Lee
Jaekwon Im
Dabin Kim
Juhan Nam
VGen
129
10
0
21 Aug 2024
ICSD: An Open-source Dataset for Infant Cry and Snoring Detection
ICSD: An Open-source Dataset for Infant Cry and Snoring Detection
Qingyu Liu
Longfei Song
Dongxing Xu
Yanhua Long
90
0
0
20 Aug 2024
CACE-Net: Co-guidance Attention and Contrastive Enhancement for
  Effective Audio-Visual Event Localization
CACE-Net: Co-guidance Attention and Contrastive Enhancement for Effective Audio-Visual Event Localization
Xiang He
Xiangxi Liu
Yang Li
Dongcheng Zhao
Guobin Shen
Qingqun Kong
Xin Yang
Yi Zeng
109
6
0
04 Aug 2024
Contrasting Deep Learning Models for Direct Respiratory Insufficiency
  Detection Versus Blood Oxygen Saturation Estimation
Contrasting Deep Learning Models for Direct Respiratory Insufficiency Detection Versus Blood Oxygen Saturation Estimation
M. Gauy
Natalia Hitomi Koza
Ricardo Mikio Morita
Gabriel Rocha Stanzione
Arnaldo Cândido Júnior
L. Berti
A. S. Levin
E. Sabino
F. Svartman
Marcelo Finger
54
0
0
30 Jul 2024
MMTrail: A Multimodal Trailer Video Dataset with Language and Music
  Descriptions
MMTrail: A Multimodal Trailer Video Dataset with Language and Music Descriptions
Xiaowei Chi
Yatian Wang
Aosong Cheng
Pengjun Fang
Zeyue Tian
...
Wenhan Luo
Qifeng Chen
Shanghang Zhang
Qi-fei Liu
Yi-Ting Guo
122
7
0
30 Jul 2024
Start from Video-Music Retrieval: An Inter-Intra Modal Loss for Cross
  Modal Retrieval
Start from Video-Music Retrieval: An Inter-Intra Modal Loss for Cross Modal Retrieval
Zeyu Chen
Pengfei Zhang
Kai Ye
Wei Dong
Xin Feng
Yana Zhang
69
0
0
28 Jul 2024
I can listen but cannot read: An evaluation of two-tower multimodal
  systems for instrument recognition
I can listen but cannot read: An evaluation of two-tower multimodal systems for instrument recognition
Yannis Vasilakis
Rachel M. Bittner
Johan Pauwels
85
1
0
25 Jul 2024
Computer Audition: From Task-Specific Machine Learning to Foundation
  Models
Computer Audition: From Task-Specific Machine Learning to Foundation Models
Andreas Triantafyllopoulos
Iosif Tsangko
Alexander Gebhard
A. Mesaros
Tuomas Virtanen
Björn Schuller
96
4
0
22 Jul 2024
Integrating IP Broadcasting with Audio Tags: Workflow and Challenges
Integrating IP Broadcasting with Audio Tags: Workflow and Challenges
Rhys Burchett-Vass
Arshdeep Singh
Gabriel Bibbó
Mark D. Plumbley
68
0
0
22 Jul 2024
Stable Audio Open
Stable Audio Open
Zach Evans
Julian Parker
CJ Carr
Zack Zukowski
Josiah Taylor
Jordi Pons
257
53
0
19 Jul 2024
Enhancing Zero-shot Audio Classification using Sound Attribute Knowledge
  from Large Language Models
Enhancing Zero-shot Audio Classification using Sound Attribute Knowledge from Large Language Models
Xuenan Xu
Pingyue Zhang
Ming Yan
Ji Zhang
Mengyue Wu
VLM
126
0
0
19 Jul 2024
Universal Sound Separation with Self-Supervised Audio Masked Autoencoder
Universal Sound Separation with Self-Supervised Audio Masked Autoencoder
Junqi Zhao
Xubo Liu
Jinzheng Zhao
Yiitan Yuan
Qiuqiang Kong
Mark D. Plumbley
Wenwu Wang
75
4
0
16 Jul 2024
ElasticAST: An Audio Spectrogram Transformer for All Length and
  Resolutions
ElasticAST: An Audio Spectrogram Transformer for All Length and Resolutions
Jiu Feng
Mehmet Hamza Erol
Joon Son Chung
Arda Senocak
60
1
0
11 Jul 2024
FunAudioLLM: Voice Understanding and Generation Foundation Models for
  Natural Interaction Between Humans and LLMs
FunAudioLLM: Voice Understanding and Generation Foundation Models for Natural Interaction Between Humans and LLMs
Keyu An
Qian Chen
Chong Deng
Zhihao Du
Changfeng Gao
...
Bin Zhang
Qinglin Zhang
Shiliang Zhang
Nan Zhao
Siqi Zheng
AuLLM
139
57
0
04 Jul 2024
Are you sure? Analysing Uncertainty Quantification Approaches for
  Real-world Speech Emotion Recognition
Are you sure? Analysing Uncertainty Quantification Approaches for Real-world Speech Emotion Recognition
Oliver Schrufer
M. Milling
Felix Burkhardt
F. Eyben
Björn Schuller
65
3
0
01 Jul 2024
Subtractive Training for Music Stem Insertion using Latent Diffusion Models
Subtractive Training for Music Stem Insertion using Latent Diffusion Models
Ivan Villa-Renteria
Mason L. Wang
Zachary Shah
Zhe Li
Soohyun Kim
Neelesh Ramachandran
Mert Pilanci
179
0
0
27 Jun 2024
Exploring compressibility of transformer based text-to-music (TTM)
  models
Exploring compressibility of transformer based text-to-music (TTM) models
Vasileios Moschopoulos
Thanasis Kotsiopoulos
Pablo Peso Parada
Konstantinos Nikiforidis
Alexandros Stergiadis
Gerasimos Papakostas
Md. Asif Jalal
Jisi Zhang
Anastasios Drosou
Karthikeyan P. Saravanan
47
0
0
24 Jun 2024
Towards Open Respiratory Acoustic Foundation Models: Pretraining and
  Benchmarking
Towards Open Respiratory Acoustic Foundation Models: Pretraining and Benchmarking
Yuwei Zhang
Tong Xia
Jing Han
Yu Wu
Georgios Rizos
Yang Liu
Mohammed Mosuily
Jagmohan Chauhan
Cecilia Mascolo
AI4CE
69
11
0
23 Jun 2024
LARP: Language Audio Relational Pre-training for Cold-Start Playlist
  Continuation
LARP: Language Audio Relational Pre-training for Cold-Start Playlist Continuation
Rebecca Salganik
Xiaohao Liu
Yunshan Ma
Jian Kang
Tat-Seng Chua
CLL
92
2
0
20 Jun 2024
DASB -- Discrete Audio and Speech Benchmark
DASB -- Discrete Audio and Speech Benchmark
Pooneh Mousavi
Luca Della Libera
J. Duret
Artem Ploujnikov
Cem Subakan
Mirco Ravanelli
90
21
0
20 Jun 2024
Online Domain-Incremental Learning Approach to Classify Acoustic Scenes
  in All Locations
Online Domain-Incremental Learning Approach to Classify Acoustic Scenes in All Locations
Manjunath Mulimani
A. Mesaros
CLL
58
1
0
19 Jun 2024
Enhancing Automated Audio Captioning via Large Language Models with
  Optimized Audio Encoding
Enhancing Automated Audio Captioning via Large Language Models with Optimized Audio Encoding
Jizhong Liu
Gang Li
Junbo Zhang
Heinrich Dinkel
Yongqing Wang
Zhiyong Yan
Yujun Wang
Bin Wang
AuLLM
135
5
0
19 Jun 2024
Improving Text-To-Audio Models with Synthetic Captions
Improving Text-To-Audio Models with Synthetic Captions
Zhifeng Kong
Sang-gil Lee
Deepanway Ghosal
Navonil Majumder
Ambuj Mehrish
Rafael Valle
Soujanya Poria
Bryan Catanzaro
110
13
0
18 Jun 2024
MINT: a Multi-modal Image and Narrative Text Dubbing Dataset for Foley
  Audio Content Planning and Generation
MINT: a Multi-modal Image and Narrative Text Dubbing Dataset for Foley Audio Content Planning and Generation
Ruibo Fu
Shuchen Shi
Hongming Guo
Tao Wang
Chunyu Qiang
...
Zhiyong Wang
Yukun Liu
Xuefei Liu
Shuai Zhang
Guanjun Li
VGen
42
0
0
15 Jun 2024
Phoneme Discretized Saliency Maps for Explainable Detection of
  AI-Generated Voice
Phoneme Discretized Saliency Maps for Explainable Detection of AI-Generated Voice
Shubham Gupta
Mirco Ravanelli
Pascal Germain
Cem Subakan
FAtt
65
4
0
14 Jun 2024
PianoMotion10M: Dataset and Benchmark for Hand Motion Generation in Piano Performance
PianoMotion10M: Dataset and Benchmark for Hand Motion Generation in Piano Performance
Qijun Gan
Song Wang
Shengtao Wu
Jianke Zhu
264
1
0
13 Jun 2024
ICGAN: An implicit conditioning method for interpretable feature control
  of neural audio synthesis
ICGAN: An implicit conditioning method for interpretable feature control of neural audio synthesis
Yunyi Liu
Craig Jin
69
0
0
11 Jun 2024
Towards better visualizations of urban sound environments: insights from
  interviews
Towards better visualizations of urban sound environments: insights from interviews
Modan Tailleur
Pierre Aumond
Vincent Tourre
Mathieu Lagrange
23
0
0
11 Jun 2024
INTERSPEECH 2009 Emotion Challenge Revisited: Benchmarking 15 Years of
  Progress in Speech Emotion Recognition
INTERSPEECH 2009 Emotion Challenge Revisited: Benchmarking 15 Years of Progress in Speech Emotion Recognition
Andreas Triantafyllopoulos
A. Batliner
Simon Rampp
M. Milling
Björn Schuller
VLM
65
1
0
10 Jun 2024
Audio-based Step-count Estimation for Running -- Windowing and Neural
  Network Baselines
Audio-based Step-count Estimation for Running -- Windowing and Neural Network Baselines
Philipp Wagner
Andreas Triantafyllopoulos
Alexander Gebhard
Björn Schuller
62
0
0
10 Jun 2024
Zero-Shot Audio Captioning Using Soft and Hard Prompts
Zero-Shot Audio Captioning Using Soft and Hard Prompts
Yiming Zhang
Xuenan Xu
Ruoyi Du
Haohe Liu
Yuan Dong
Zheng-Hua Tan
Wenwu Wang
Zhanyu Ma
VLM
77
4
0
10 Jun 2024
Soundscape Captioning using Sound Affective Quality Network and Large
  Language Model
Soundscape Captioning using Sound Affective Quality Network and Large Language Model
Yuanbo Hou
Qiaoqiao Ren
A. Mitchell
Wenwu Wang
Jian Kang
Tony Belpaeme
Dick Botteldooren
103
3
0
09 Jun 2024
PPPR: Portable Plug-in Prompt Refiner for Text to Audio Generation
PPPR: Portable Plug-in Prompt Refiner for Text to Audio Generation
Shuchen Shi
Ruibo Fu
Zhengqi Wen
Jianhua Tao
Tao Wang
...
Xuefei Liu
Yukun Liu
Yongwei Li
Zhiyong Wang
Xiaopeng Wang
47
1
0
07 Jun 2024
MeLFusion: Synthesizing Music from Image and Language Cues using
  Diffusion Models
MeLFusion: Synthesizing Music from Image and Language Cues using Diffusion Models
Sanjoy Chowdhury
Sayan Nag
K. J. Joseph
Balaji Vasan Srinivasan
Dinesh Manocha
DiffM
84
8
0
07 Jun 2024
VidMuse: A Simple Video-to-Music Generation Framework with Long-Short-Term Modeling
VidMuse: A Simple Video-to-Music Generation Framework with Long-Short-Term Modeling
Zeyue Tian
Zhaoyang Liu
Ruibin Yuan
Jiahao Pan
Xiaoqiang Huang
Xu Tan
Xu Tan
Qifeng Chen
Yu Guo
VGen
275
17
0
06 Jun 2024
Audio Mamba: Bidirectional State Space Model for Audio Representation
  Learning
Audio Mamba: Bidirectional State Space Model for Audio Representation Learning
Mehmet Hamza Erol
Arda Senocak
Jiu Feng
Joon Son Chung
Mamba
139
25
0
05 Jun 2024
M2D-CLAP: Masked Modeling Duo Meets CLAP for Learning General-purpose
  Audio-Language Representation
M2D-CLAP: Masked Modeling Duo Meets CLAP for Learning General-purpose Audio-Language Representation
Daisuke Niizumi
Daiki Takeuchi
Yasunori Ohishi
Noboru Harada
Masahiro Yasuda
Shunsuke Tsubaki
Keisuke Imoto
VLM
93
7
0
04 Jun 2024
A Survey of Deep Learning Audio Generation Methods
A Survey of Deep Learning Audio Generation Methods
Matej Bozic
Marko Horvat
VLMMedIm
102
1
0
31 May 2024
Reverse the auditory processing pathway: Coarse-to-fine audio
  reconstruction from fMRI
Reverse the auditory processing pathway: Coarse-to-fine audio reconstruction from fMRI
Che Liu
Changde Du
Xiaoyu Chen
Huiguang He
74
2
0
29 May 2024
Discriminant audio properties in deep learning based respiratory
  insufficiency detection in Brazilian Portuguese
Discriminant audio properties in deep learning based respiratory insufficiency detection in Brazilian Portuguese
M. Gauy
L. Berti
Arnaldo Candido
Augusto Camargo Neto
Alfredo Goldman
...
B. Medeiros
Marcelo Queiroz
E. Sabino
F. Svartman
Marcelo Finger
42
1
0
27 May 2024
Listenable Maps for Zero-Shot Audio Classifiers
Listenable Maps for Zero-Shot Audio Classifiers
Francesco Paissan
Luca Della Libera
Mirco Ravanelli
Cem Subakan
105
4
0
27 May 2024
Audio Mamba: Pretrained Audio State Space Model For Audio Tagging
Audio Mamba: Pretrained Audio State Space Model For Audio Tagging
Jiaju Lin
Haoxuan Hu
Mamba
51
9
0
22 May 2024
A Dataset and Baselines for Measuring and Predicting the Music Piece
  Memorability
A Dataset and Baselines for Measuring and Predicting the Music Piece Memorability
Li-Yang Tseng
Tzu-Ling Lin
Hong-Han Shuai
Jen-Wei Huang
Wen-Whei Chang
44
0
0
21 May 2024
AudioSetMix: Enhancing Audio-Language Datasets with LLM-Assisted
  Augmentations
AudioSetMix: Enhancing Audio-Language Datasets with LLM-Assisted Augmentations
David Xu
63
2
0
17 May 2024
Revisiting Deep Audio-Text Retrieval Through the Lens of Transportation
Revisiting Deep Audio-Text Retrieval Through the Lens of Transportation
Manh Luong
Khai Nguyen
Nhat Ho
Reza Haf
D.Q. Phung
Lizhen Qu
72
13
0
16 May 2024
No More Mumbles: Enhancing Robot Intelligibility through Speech
  Adaptation
No More Mumbles: Enhancing Robot Intelligibility through Speech Adaptation
Qiaoqiao Ren
Yuanbo Hou
Dick Botteldooren
Tony Belpaeme
34
4
0
15 May 2024
Previous
123456...91011
Next