v1v2 (latest)

AENet: Learning Deep Audio Features for Video Analysis

IEEE transactions on multimedia (IEEE TMM), 2017

3 January 2017

Naoya Takahashi

Michael Gygli

Luc Van Gool

ArXiv (abs)PDF HTML

Papers citing "AENet: Learning Deep Audio Features for Video Analysis"

37 / 37 papers shown

Optimising MFCC parameters for the automatic detection of respiratory diseases

219

14 Aug 2024

Onset and offset weighted loss function for sound event detection

Tao Song

230

20 Mar 2024

Zero- and Few-shot Sound Event Localization and DetectionIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023

305

17 Sep 2023

Machine Unlearning: Solutions and ChallengesIEEE Transactions on Emerging Topics in Computational Intelligence (TETCI), 2023

531

122

14 Aug 2023

MAiVAR-T: Multimodal Audio-image and Video Action Recognizer using TransformersEuropean Workshop on Visual Information Processing (EUVIP), 2023

Muhammad Bilal Shaikh

Douglas Chai

Syed Mohammed Shamsul Islam

Naveed Akhtar

355

01 Aug 2023

Joint Moment Retrieval and Highlight Detection Via Natural Language Queries

153

08 May 2023

Deep Learning Based Multimodal with Two-phase Training Strategy for Daily Life Video ClassificationInternational Conference on Content-Based Multimedia Indexing (CBMI), 2023

160

30 Apr 2023

VarietySound: Timbre-Controllable Video to Sound Generation via Unsupervised Information DisentanglementIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2022

Chenye Cui

Yi Ren

Jinglin Liu

Rongjie Huang

Zhou Zhao

VGen

250

19 Nov 2022

MAiVAR: Multimodal Audio-Image and Video Action RecognizerVisual Communications and Image Processing (VCIP), 2022

Muhammad Bilal Shaikh

Douglas Chai

S. Islam

Naveed Akhtar

232

11 Sep 2022

Talking Head Generation Driven by Speech-Related Facial Action Units and Audio- Based on Multimodal Representation Fusion

Sen Chen

Zhilei Liu

Jiaxing Liu

Longbiao Wang

177

27 Apr 2022

Multi-ACCDOA: Localizing and Detecting Overlapping Sounds from the Same Class with Auxiliary Duplicating Permutation Invariant Training

200

104

14 Oct 2021

Spatial Data Augmentation with Simulated Room Impulse Responses for Sound Event Localization and Detection

296

13 Oct 2021

Spatial mixup: Directional loudness modification as data augmentation for sound event localization and detectionIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2021

253

12 Oct 2021

Imbalanced Big Data Oversampling: Taxonomy, Algorithms, Software, Guidelines and Future DirectionsACM Computing Surveys (CSUR), 2021

W. Sleeman

R. Kapoor

AI4TS

306

101

24 Jul 2021

FoleyGAN: Visually Guided Generative Adversarial Network-Based Synchronous Sound Generation in Silent VideosIEEE transactions on multimedia (IEEE Trans. Multimedia), 2021

Sanchita Ghose

John J. Prevost

GAN

178

20 Jul 2021

Ensemble of ACCDOA- and EINV2-based Systems with D3Nets and Impulse Response Simulation for Sound Event Localization and Detection

163

21 Jun 2021

Deep Learning Frameworks Applied For Audio-Visual Scene Classification

131

12 Jun 2021

RelationTrack: Relation-aware Multiple Object Tracking with Decoupled RepresentationIEEE transactions on multimedia (IEEE Trans. Multimedia), 2021

248

179

10 May 2021

A Four-Stage Data Augmentation Approach to ResNet-Conformer Based Acoustic Modeling for Sound Event Localization and DetectionIEEE/ACM Transactions on Audio Speech and Language Processing (TASLP), 2021

Qing Wang

Jun Du

Hua-Xin Wu

Jia Pan

Feng Ma

Chin-Hui Lee

242

121

08 Jan 2021

Densely connected multidilated convolutional networks for dense prediction tasksComputer Vision and Pattern Recognition (CVPR), 2020

Naoya Takahashi

Yuki Mitsufuji

3DV

260

21 Nov 2020

ACCDOA: Activity-Coupled Cartesian Direction of Arrival Representation for Sound Event Localization and DetectionIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2020

379

127

29 Oct 2020

Sound Event Localization and Detection Using Activity-Coupled Cartesian DOA Vector and RD3net

317

22 Jun 2020

EDropout: Energy-Based Dropout and Pruning of Deep Neural Networks

Hojjat Salehinejad

S. Valaee

416

07 Jun 2020

AutoFoley: Artificial Synthesis of Synchronized Sound Tracks for Silent Videos with Deep LearningIEEE transactions on multimedia (TMM), 2020

Sanchita Ghose

John J. Prevost

VGen

257

21 Feb 2020

SummaryNet: A Multi-Stage Deep Learning Model for Automatic Video Summarisation

Ziyad Jappie

David Torpey

Turgay Celik

109

19 Feb 2020

Data augmentation approaches for improving animal audio classificationEcological Informatics (EI), 2019

L. Nanni

Gianluca Maguolo

M. Paci

229

165

16 Dec 2019

Audiogmenter: a MATLAB Toolbox for Audio Data AugmentationApplied Computing and Informatics (ACI), 2019

203

11 Dec 2019

Improving Voice Separation by Incorporating End-to-end Speech RecognitionIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2019

Naoya Takahashi

M. Singh

Sakya Basak

Sudarsanam Parthasaarathy

Sriram Ganapathy

Yuki Mitsufuji

VLM

200

29 Nov 2019

An End-to-End Audio Classification System based on Raw Waveforms and Mix-Training StrategyInterspeech (Interspeech), 2019

243

21 Nov 2019

Audio-Visual Model Distillation Using Acoustic Images

249

16 Apr 2019

Audio-Visual Scene-Aware Dialog

...

Devi Parikh

408

219

25 Jan 2019

Cross-domain Deep Feature Combination for Bird Species Classification with Audio-visual Data

B. Naranchimeg

Chao Zhang

T. Akashi

110

26 Nov 2018

Listening for Sirens: Locating and Classifying Acoustic Alarms in City Scenes

Letizia Marchegiani

Paul Newman

171

11 Oct 2018

MMDenseLSTM: An efficient combination of convolutional and recurrent neural networks for audio source separation

Naoya Takahashi

Nabarun Goswami

Yuki Mitsufuji

241

150

07 May 2018

Speech-Driven Facial Reenactment Using Conditional Generative Adversarial Networks

166

20 Mar 2018

Multi-scale Multi-band DenseNets for Audio Source Separation

Naoya Takahashi

Yuki Mitsufuji

223

162

29 Jun 2017

Characterizing Types of Convolution in Deep Convolutional Recurrent Neural Networks for Robust Speech Emotion Recognition

Che-Wei Huang

Shrikanth. S. Narayanan

HAI

166

07 Jun 2017