Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1701.00599
Cited By
AENet: Learning Deep Audio Features for Video Analysis
3 January 2017
Naoya Takahashi
Michael Gygli
Luc Van Gool
Re-assign community
ArXiv
PDF
HTML
Papers citing
"AENet: Learning Deep Audio Features for Video Analysis"
20 / 20 papers shown
Title
MAiVAR-T: Multimodal Audio-image and Video Action Recognizer using Transformers
Muhammad Bilal Shaikh
Douglas Chai
Syed Mohammed Shamsul Islam
Naveed Akhtar
30
5
0
01 Aug 2023
VarietySound: Timbre-Controllable Video to Sound Generation via Unsupervised Information Disentanglement
Chenye Cui
Yi Ren
Jinglin Liu
Rongjie Huang
Zhou Zhao
VGen
38
14
0
19 Nov 2022
MAiVAR: Multimodal Audio-Image and Video Action Recognizer
Muhammad Bilal Shaikh
Douglas Chai
S. Islam
Naveed Akhtar
32
5
0
11 Sep 2022
Talking Head Generation Driven by Speech-Related Facial Action Units and Audio- Based on Multimodal Representation Fusion
Sen Chen
Zhilei Liu
Jiaxing Liu
Longbiao Wang
39
6
0
27 Apr 2022
Multi-ACCDOA: Localizing and Detecting Overlapping Sounds from the Same Class with Auxiliary Duplicating Permutation Invariant Training
Kazuki Shimada
Yuichiro Koyama
Shusuke Takahashi
Naoya Takahashi
E. Tsunoo
Yuki Mitsufuji
13
63
0
14 Oct 2021
Imbalanced Big Data Oversampling: Taxonomy, Algorithms, Software, Guidelines and Future Directions
W. Sleeman
R. Kapoor
AI4TS
17
71
0
24 Jul 2021
FoleyGAN: Visually Guided Generative Adversarial Network-Based Synchronous Sound Generation in Silent Videos
Sanchita Ghose
John J. Prevost
GAN
27
26
0
20 Jul 2021
Ensemble of ACCDOA- and EINV2-based Systems with D3Nets and Impulse Response Simulation for Sound Event Localization and Detection
Kazuki Shimada
Naoya Takahashi
Yuichiro Koyama
Shusuke Takahashi
E. Tsunoo
Masafumi Takahashi
Yuki Mitsufuji
30
23
0
21 Jun 2021
RelationTrack: Relation-aware Multiple Object Tracking with Decoupled Representation
En Yu
Zhuoling Li
Shoudong Han
Hongwei Wang
VOT
53
128
0
10 May 2021
A Four-Stage Data Augmentation Approach to ResNet-Conformer Based Acoustic Modeling for Sound Event Localization and Detection
Qing Wang
Jun Du
Hua-Xin Wu
Jia Pan
Feng Ma
Chin-Hui Lee
13
79
0
08 Jan 2021
Densely connected multidilated convolutional networks for dense prediction tasks
Naoya Takahashi
Yuki Mitsufuji
3DV
21
64
0
21 Nov 2020
ACCDOA: Activity-Coupled Cartesian Direction of Arrival Representation for Sound Event Localization and Detection
Kazuki Shimada
Yuichiro Koyama
Naoya Takahashi
Shusuke Takahashi
Yuki Mitsufuji
23
86
0
29 Oct 2020
Sound Event Localization and Detection Using Activity-Coupled Cartesian DOA Vector and RD3net
Kazuki Shimada
Naoya Takahashi
Shusuke Takahashi
Yuki Mitsufuji
16
19
0
22 Jun 2020
AutoFoley: Artificial Synthesis of Synchronized Sound Tracks for Silent Videos with Deep Learning
Sanchita Ghose
John J. Prevost
VGen
14
46
0
21 Feb 2020
Audio-Visual Model Distillation Using Acoustic Images
Andrés F. Pérez
Valentina Sanguineti
Pietro Morerio
Vittorio Murino
VLM
15
27
0
16 Apr 2019
Audio-Visual Scene-Aware Dialog
Huda AlAmri
Vincent Cartillier
Abhishek Das
Jue Wang
A. Cherian
...
Tim K. Marks
Chiori Hori
Peter Anderson
Stefan Lee
Devi Parikh
VGen
27
189
0
25 Jan 2019
Listening for Sirens: Locating and Classifying Acoustic Alarms in City Scenes
Letizia Marchegiani
Paul Newman
15
35
0
11 Oct 2018
MMDenseLSTM: An efficient combination of convolutional and recurrent neural networks for audio source separation
Naoya Takahashi
Nabarun Goswami
Yuki Mitsufuji
39
141
0
07 May 2018
Characterizing Types of Convolution in Deep Convolutional Recurrent Neural Networks for Robust Speech Emotion Recognition
Che-Wei Huang
Shrikanth. S. Narayanan
HAI
27
25
0
07 Jun 2017
Improving neural networks by preventing co-adaptation of feature detectors
Geoffrey E. Hinton
Nitish Srivastava
A. Krizhevsky
Ilya Sutskever
Ruslan Salakhutdinov
VLM
266
7,639
0
03 Jul 2012
1