AENet: Learning Deep Audio Features for Video Analysis

3 January 2017

Luc Van Gool

Papers citing "AENet: Learning Deep Audio Features for Video Analysis"

20 / 20 papers shown

Title
MAiVAR-T: Multimodal Audio-image and Video Action Recognizer using Transformers Muhammad Bilal Shaikh Douglas Chai Syed Mohammed Shamsul Islam Naveed Akhtar 30 5 0 01 Aug 2023
VarietySound: Timbre-Controllable Video to Sound Generation via Unsupervised Information Disentanglement Chenye Cui Yi Ren Jinglin Liu Rongjie Huang Zhou Zhao VGen 38 14 0 19 Nov 2022
MAiVAR: Multimodal Audio-Image and Video Action Recognizer Muhammad Bilal Shaikh Douglas Chai S. Islam Naveed Akhtar 32 5 0 11 Sep 2022
Talking Head Generation Driven by Speech-Related Facial Action Units and Audio- Based on Multimodal Representation Fusion Sen Chen Zhilei Liu Jiaxing Liu Longbiao Wang 39 6 0 27 Apr 2022
Multi-ACCDOA: Localizing and Detecting Overlapping Sounds from the Same Class with Auxiliary Duplicating Permutation Invariant Training Kazuki Shimada Yuichiro Koyama Shusuke Takahashi Naoya Takahashi E. Tsunoo Yuki Mitsufuji 13 63 0 14 Oct 2021
Imbalanced Big Data Oversampling: Taxonomy, Algorithms, Software, Guidelines and Future Directions W. Sleeman R. Kapoor AI4TS 17 71 0 24 Jul 2021
FoleyGAN: Visually Guided Generative Adversarial Network-Based Synchronous Sound Generation in Silent Videos Sanchita Ghose John J. Prevost GAN 27 26 0 20 Jul 2021
Ensemble of ACCDOA- and EINV2-based Systems with D3Nets and Impulse Response Simulation for Sound Event Localization and Detection Kazuki Shimada Naoya Takahashi Yuichiro Koyama Shusuke Takahashi E. Tsunoo Masafumi Takahashi Yuki Mitsufuji 30 23 0 21 Jun 2021
RelationTrack: Relation-aware Multiple Object Tracking with Decoupled Representation En Yu Zhuoling Li Shoudong Han Hongwei Wang VOT 53 128 0 10 May 2021
A Four-Stage Data Augmentation Approach to ResNet-Conformer Based Acoustic Modeling for Sound Event Localization and Detection Qing Wang Jun Du Hua-Xin Wu Jia Pan Feng Ma Chin-Hui Lee 13 79 0 08 Jan 2021
Densely connected multidilated convolutional networks for dense prediction tasks Naoya Takahashi Yuki Mitsufuji 3DV 21 64 0 21 Nov 2020
ACCDOA: Activity-Coupled Cartesian Direction of Arrival Representation for Sound Event Localization and Detection Kazuki Shimada Yuichiro Koyama Naoya Takahashi Shusuke Takahashi Yuki Mitsufuji 23 86 0 29 Oct 2020
Sound Event Localization and Detection Using Activity-Coupled Cartesian DOA Vector and RD3net Kazuki Shimada Naoya Takahashi Shusuke Takahashi Yuki Mitsufuji 16 19 0 22 Jun 2020
AutoFoley: Artificial Synthesis of Synchronized Sound Tracks for Silent Videos with Deep Learning Sanchita Ghose John J. Prevost VGen 14 46 0 21 Feb 2020
Audio-Visual Model Distillation Using Acoustic Images Andrés F. Pérez Valentina Sanguineti Pietro Morerio Vittorio Murino VLM 15 27 0 16 Apr 2019
Audio-Visual Scene-Aware Dialog Huda AlAmri Vincent Cartillier Abhishek Das Jue Wang A. Cherian ... Tim K. Marks Chiori Hori Peter Anderson Stefan Lee Devi Parikh VGen 27 189 0 25 Jan 2019
Listening for Sirens: Locating and Classifying Acoustic Alarms in City Scenes Letizia Marchegiani Paul Newman 15 35 0 11 Oct 2018
MMDenseLSTM: An efficient combination of convolutional and recurrent neural networks for audio source separation Naoya Takahashi Nabarun Goswami Yuki Mitsufuji 39 141 0 07 May 2018
Characterizing Types of Convolution in Deep Convolutional Recurrent Neural Networks for Robust Speech Emotion Recognition Che-Wei Huang Shrikanth. S. Narayanan HAI 27 25 0 07 Jun 2017
Improving neural networks by preventing co-adaptation of feature detectors Geoffrey E. Hinton Nitish Srivastava A. Krizhevsky Ilya Sutskever Ruslan Salakhutdinov VLM 266 7,639 0 03 Jul 2012