ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1609.09430
  4. Cited By
CNN Architectures for Large-Scale Audio Classification

CNN Architectures for Large-Scale Audio Classification

29 September 2016
Shawn Hershey
Sourish Chaudhuri
D. Ellis
J. Gemmeke
A. Jansen
R. C. Moore
Manoj Plakal
D. Platt
Rif A. Saurous
Bryan Seybold
M. Slaney
Ron J. Weiss
K. Wilson
ArXivPDFHTML

Papers citing "CNN Architectures for Large-Scale Audio Classification"

50 / 298 papers shown
Title
UWAV: Uncertainty-weighted Weakly-supervised Audio-Visual Video Parsing
UWAV: Uncertainty-weighted Weakly-supervised Audio-Visual Video Parsing
Yung-Hsuan Lai
Janek Ebbers
Yu-Chiang Frank Wang
François G. Germain
Michael J. Jones
Moitreya Chatterjee
21
0
0
14 May 2025
Discrete Optimal Transport and Voice Conversion
Discrete Optimal Transport and Voice Conversion
Anton Selitskiy
Maitreya Kocharekar
OT
75
0
0
07 May 2025
Efficient Continual Learning in Keyword Spotting using Binary Neural Networks
Efficient Continual Learning in Keyword Spotting using Binary Neural Networks
Quynh Nguyen Phuong Vu
Luciano S. Martinez-Rau
Yuxuan Zhang
Nho-Duc Tran
Bengt Oelmann
Michele Magno
Sebastian Bader
CLL
38
0
0
05 May 2025
DOSE : Drum One-Shot Extraction from Music Mixture
DOSE : Drum One-Shot Extraction from Music Mixture
Suntae Hwang
Seonghyeon Kang
Kyungsu Kim
Semin Ahn
K. Lee
36
0
0
25 Apr 2025
DTFSal: Audio-Visual Dynamic Token Fusion for Video Saliency Prediction
DTFSal: Audio-Visual Dynamic Token Fusion for Video Saliency Prediction
Kiana Hoshanfar
Alireza Hosseini
Ahmad Kalhor
Babak Nadjar Araabi
130
0
0
14 Apr 2025
LoopGen: Training-Free Loopable Music Generation
LoopGen: Training-Free Loopable Music Generation
Davide Marincione
Giorgio Strano
Donato Crisostomi
Roberto Ribuoli
Emanuele Rodolà
MGen
60
0
0
06 Apr 2025
A Survey on Music Generation from Single-Modal, Cross-Modal, and Multi-Modal Perspectives
A Survey on Music Generation from Single-Modal, Cross-Modal, and Multi-Modal Perspectives
Shuyu Li
Shulei Ji
Zihao W. Wang
Songruoyao Wu
Jiaxing Yu
K. Zhang
MGen
VGen
70
1
0
01 Apr 2025
Adapting to the Unknown: Training-Free Audio-Visual Event Perception with Dynamic Thresholds
Adapting to the Unknown: Training-Free Audio-Visual Event Perception with Dynamic Thresholds
E. Shaar
Ariel Shaulov
Gal Chechik
Lior Wolf
VLM
41
0
0
17 Mar 2025
AudioX: Diffusion Transformer for Anything-to-Audio Generation
AudioX: Diffusion Transformer for Anything-to-Audio Generation
Zeyue Tian
Yizhu Jin
Zhaoyang Liu
Ruibin Yuan
Xu Tan
Qifeng Chen
Wei Xue
Y. Guo
67
3
0
13 Mar 2025
Dynamic Multimodal Fusion via Meta-Learning Towards Micro-Video Recommendation
Dynamic Multimodal Fusion via Meta-Learning Towards Micro-Video Recommendation
Han Liu
Yinwei Wei
Fan Liu
W. Wang
Liqiang Nie
Tat-Seng Chua
48
17
0
13 Jan 2025
COCOLA: Coherence-Oriented Contrastive Learning of Musical Audio Representations
COCOLA: Coherence-Oriented Contrastive Learning of Musical Audio Representations
Ruben Ciranni
Emilian Postolache
Giorgio Mariani
Michele Mancusi
Giorgio Fabbro
Emanuele Rodolà
Luca Cosmo
74
7
0
10 Jan 2025
MotionAura: Generating High-Quality and Motion Consistent Videos using Discrete Diffusion
MotionAura: Generating High-Quality and Motion Consistent Videos using Discrete Diffusion
Onkar Susladkar
Jishu Sen Gupta
Chirag Sehgal
Sparsh Mittal
Rekha Singhal
DiffM
VGen
40
0
0
10 Oct 2024
Art2Mus: Bridging Visual Arts and Music through Cross-Modal Generation
Art2Mus: Bridging Visual Arts and Music through Cross-Modal Generation
Ivan Rinaldi
Nicola Fanelli
Giovanna Castellano
G. Vessio
31
2
0
07 Oct 2024
Semi-Supervised Cognitive State Classification from Speech with Multi-View Pseudo-Labeling
Semi-Supervised Cognitive State Classification from Speech with Multi-View Pseudo-Labeling
Yuanchao Li
Zixing Zhang
Jing Han
P. Bell
Catherine Lai
69
0
0
25 Sep 2024
Generalization in birdsong classification: impact of transfer learning
  methods and dataset characteristics
Generalization in birdsong classification: impact of transfer learning methods and dataset characteristics
Burooj Ghani
Vincent J. Kalkman
Bob Planqué
Willem-Pier Vellinga
L. Gill
Dan Stowell
VLM
32
5
0
21 Sep 2024
AudioEditor: A Training-Free Diffusion-Based Audio Editing Framework
AudioEditor: A Training-Free Diffusion-Based Audio Editing Framework
Yuhang Jia
Yang Chen
Jinghua Zhao
Shiwan Zhao
Wenjia Zeng
Yong Chen
Yong Qin
DiffM
33
1
0
19 Sep 2024
ViolinDiff: Enhancing Expressive Violin Synthesis with Pitch Bend Conditioning
ViolinDiff: Enhancing Expressive Violin Synthesis with Pitch Bend Conditioning
Daewoong Kim
Hao-Wen Dong
Dasaem Jeong
23
0
0
19 Sep 2024
The Sounds of Home: A Speech-Removed Residential Audio Dataset for Sound
  Event Detection
The Sounds of Home: A Speech-Removed Residential Audio Dataset for Sound Event Detection
Gabriel Bibbó
Thomas Deacon
Arshdeep Singh
Mark D. Plumbley
21
0
0
17 Sep 2024
LC-Protonets: Multi-Label Few-Shot Learning for World Music Audio Tagging
LC-Protonets: Multi-Label Few-Shot Learning for World Music Audio Tagging
Charilaos Papaioannou
Emmanouil Benetos
Alexandros Potamianos
34
0
0
17 Sep 2024
MambaFoley: Foley Sound Generation using Selective State-Space Models
MambaFoley: Foley Sound Generation using Selective State-Space Models
Marco Furio Colombo
Francesca Ronchini
Luca Comanducci
Fabio Antonacci
Mamba
25
1
0
13 Sep 2024
Locality-aware Cross-modal Correspondence Learning for Dense Audio-Visual Events Localization
Locality-aware Cross-modal Correspondence Learning for Dense Audio-Visual Events Localization
Ling Xing
Hongyu Qu
Rui Yan
Xiangbo Shu
Jinhui Tang
45
1
0
12 Sep 2024
Deep Learning for Video Anomaly Detection: A Review
Deep Learning for Video Anomaly Detection: A Review
Peng Wu
Chengyu Pan
Yuting Yan
Guansong Pang
Peng Wang
Yanning Zhang
VLM
AI4TS
42
6
0
09 Sep 2024
AudioInsight: Detecting Social Contexts Relevant to Social Anxiety from
  Speech
AudioInsight: Detecting Social Contexts Relevant to Social Anxiety from Speech
Varun Reddy
Zhiyuan Wang
Emma R. Toner
Max Larrazabal
M. Boukhechba
B. Teachman
Laura E. Barnes
33
4
0
19 Jul 2024
CPM: Class-conditional Prompting Machine for Audio-visual Segmentation
CPM: Class-conditional Prompting Machine for Audio-visual Segmentation
Yuanhong Chen
Chong Wang
Yuyuan Liu
Hu Wang
Gustavo Carneiro
40
2
0
07 Jul 2024
Subtractive Training for Music Stem Insertion using Latent Diffusion Models
Subtractive Training for Music Stem Insertion using Latent Diffusion Models
Ivan Villa-Renteria
Mason L. Wang
Zachary Shah
Zhe Li
Soohyun Kim
Neelesh Ramachandran
Mert Pilanci
42
0
0
27 Jun 2024
Imperceptible Rhythm Backdoor Attacks: Exploring Rhythm Transformation
  for Embedding Undetectable Vulnerabilities on Speech Recognition
Imperceptible Rhythm Backdoor Attacks: Exploring Rhythm Transformation for Embedding Undetectable Vulnerabilities on Speech Recognition
Wenhan Yao
Jiangkun Yang
yongqiang He
Jia Liu
Weiping Wen
44
1
0
16 Jun 2024
Contrastive Learning from Synthetic Audio Doppelgängers
Contrastive Learning from Synthetic Audio Doppelgängers
Manuel Cherep
Nikhil Singh
40
1
0
09 Jun 2024
VidMuse: A Simple Video-to-Music Generation Framework with Long-Short-Term Modeling
VidMuse: A Simple Video-to-Music Generation Framework with Long-Short-Term Modeling
Zeyue Tian
Zhaoyang Liu
Ruibin Yuan
Jiahao Pan
Xiaoqiang Huang
Xu Tan
Xu Tan
Qifeng Chen
Y. Guo
VGen
102
16
0
06 Jun 2024
LMVD: A Large-Scale Multimodal Vlog Dataset for Depression Detection in
  the Wild
LMVD: A Large-Scale Multimodal Vlog Dataset for Depression Detection in the Wild
Lang He
Kai Chen
Junnan Zhao
Yimeng Wang
Ercheng Pei
...
Shiqing Zhang
Jie Zhang
Zhongmin Wang
Tao He
Prayag Tiwari
50
3
0
09 May 2024
AudioRepInceptionNeXt: A lightweight single-stream architecture for
  efficient audio recognition
AudioRepInceptionNeXt: A lightweight single-stream architecture for efficient audio recognition
Kin Wai Lau
Yasar Abbas Ur Rehman
L. Po
44
1
0
21 Apr 2024
Guided Masked Self-Distillation Modeling for Distributed Multimedia
  Sensor Event Analysis
Guided Masked Self-Distillation Modeling for Distributed Multimedia Sensor Event Analysis
Masahiro Yasuda
Noboru Harada
Yasunori Ohishi
Shoichiro Saito
Akira Nakayama
Nobutaka Ono
36
3
0
12 Apr 2024
Audio-Visual Generalized Zero-Shot Learning using Pre-Trained Large
  Multi-Modal Models
Audio-Visual Generalized Zero-Shot Learning using Pre-Trained Large Multi-Modal Models
David Kurzendörfer
Otniel-Bogdan Mercea
A. Sophia Koepke
Zeynep Akata
VLM
CLIP
31
2
0
09 Apr 2024
MIPS at SemEval-2024 Task 3: Multimodal Emotion-Cause Pair Extraction in
  Conversations with Multimodal Language Models
MIPS at SemEval-2024 Task 3: Multimodal Emotion-Cause Pair Extraction in Conversations with Multimodal Language Models
Zebang Cheng
Fuqiang Niu
Yuxiang Lin
Zhi-Qi Cheng
Bowen Zhang
Xiaojiang Peng
31
7
0
31 Mar 2024
Correlation of Fréchet Audio Distance With Human Perception of
  Environmental Audio Is Embedding Dependant
Correlation of Fréchet Audio Distance With Human Perception of Environmental Audio Is Embedding Dependant
Modan Tailleur
Junwon Lee
Mathieu Lagrange
Keunwoo Choi
Laurie M. Heller
Keisuke Imoto
Yuki Okamoto
22
10
0
26 Mar 2024
Audio-Visual Segmentation via Unlabeled Frame Exploitation
Audio-Visual Segmentation via Unlabeled Frame Exploitation
Jinxiang Liu
Yikun Liu
Fei Zhang
Chen Ju
Ya-Qin Zhang
Yanfeng Wang
39
10
0
17 Mar 2024
RADIA -- Radio Advertisement Detection with Intelligent Analytics
RADIA -- Radio Advertisement Detection with Intelligent Analytics
Jorge Álvarez
J. C. Armenteros
Camilo Torrón
Miguel Ortega-Martín
Alfonso Ardoiz
...
Íñigo Galdeano
Ignacio Garrido
Adrián Alonso
Fernando Bayón
Oleg Vorontsov
26
0
0
06 Mar 2024
Hybrid Modeling Design Patterns
Hybrid Modeling Design Patterns
Maja Rudolph
Stefan Kurz
Barbara Rakitsch
AI4CE
31
8
0
29 Dec 2023
HEAR: Hearing Enhanced Audio Response for Video-grounded Dialogue
HEAR: Hearing Enhanced Audio Response for Video-grounded Dialogue
Sunjae Yoon
Dahyun Kim
Eunseop Yoon
Hee Suk Yoon
Junyeong Kim
C. Yoo
39
6
0
15 Dec 2023
A-JEPA: Joint-Embedding Predictive Architecture Can Listen
A-JEPA: Joint-Embedding Predictive Architecture Can Listen
Zhengcong Fei
Mingyuan Fan
Junshi Huang
25
17
0
27 Nov 2023
AQUATK: An Audio Quality Assessment Toolkit
AQUATK: An Audio Quality Assessment Toolkit
Ashvala Vinay
Alexander Lerch
13
2
0
16 Nov 2023
Soundbay: Deep Learning Framework for Marine Mammals and Bioacoustic
  Research
Soundbay: Deep Learning Framework for Marine Mammals and Bioacoustic Research
Noam Bressler
Michael Faran
Amit Galor
Michael Moshe Michelashvili
Tomer Nachshon
Noa Weiss
28
0
0
07 Nov 2023
Audio-Visual Instance Segmentation
Audio-Visual Instance Segmentation
Ruohao Guo
Yaru Chen
Yanyu Qi
Wenzhen Yue
Dantong Niu
...
Wenzhen Yue
Ji Shi
Qixun Wang
Peiliang Zhang
Buwen Liang
VLM
VOS
28
2
0
28 Oct 2023
Multimodal Variational Auto-encoder based Audio-Visual Segmentation
Multimodal Variational Auto-encoder based Audio-Visual Segmentation
Yuxin Mao
Jing Zhang
Mochu Xiang
Yiran Zhong
Yuchao Dai
37
34
0
12 Oct 2023
Cross-modal Cognitive Consensus guided Audio-Visual Segmentation
Cross-modal Cognitive Consensus guided Audio-Visual Segmentation
Zhaofeng Shi
Qingbo Wu
Fanman Meng
Linfeng Xu
Hongliang Li
VOS
30
3
0
10 Oct 2023
Crowdotic: A Privacy-Preserving Hospital Waiting Room Crowd Density
  Estimation with Non-speech Audio
Crowdotic: A Privacy-Preserving Hospital Waiting Room Crowd Density Estimation with Non-speech Audio
Forsad Al Hossain
Tanjid Hasan Tonmoy
A. Lover
George A. Corey
Mohammad Arif Ul Alam
Tauhidur Rahman
14
1
0
19 Sep 2023
AGS: An Dataset and Taxonomy for Domestic Scene Sound Event Recognition
AGS: An Dataset and Taxonomy for Domestic Scene Sound Event Recognition
Nan Che
Chenrui Liu
Fei Yu
33
0
0
30 Aug 2023
Audio Difference Captioning Utilizing Similarity-Discrepancy
  Disentanglement
Audio Difference Captioning Utilizing Similarity-Discrepancy Disentanglement
Daiki Takeuchi
Yasunori Ohishi
Daisuke Niizumi
Noboru Harada
K. Kashino
29
6
0
23 Aug 2023
An Ambient Intelligence-based Approach For Longitudinal Monitoring of
  Verbal and Vocal Depression Symptoms
An Ambient Intelligence-based Approach For Longitudinal Monitoring of Verbal and Vocal Depression Symptoms
Alice Othmani
M. Muzammel
14
1
0
16 Aug 2023
AudioLDM 2: Learning Holistic Audio Generation with Self-supervised
  Pretraining
AudioLDM 2: Learning Holistic Audio Generation with Self-supervised Pretraining
Haohe Liu
Yiitan Yuan
Xubo Liu
Xinhao Mei
Qiuqiang Kong
Qiao Tian
Yuping Wang
Wenwu Wang
Yuxuan Wang
Mark D. Plumbley
DiffM
25
222
0
10 Aug 2023
Contrastive Conditional Latent Diffusion for Audio-visual Segmentation
Contrastive Conditional Latent Diffusion for Audio-visual Segmentation
Yuxin Mao
Jing Zhang
Mochu Xiang
Yun-Qiu Lv
Yiran Zhong
Yuchao Dai
DiffM
43
28
0
31 Jul 2023
123456
Next