Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
1701.00599
Cited By
v1
v2 (latest)
AENet: Learning Deep Audio Features for Video Analysis
IEEE transactions on multimedia (IEEE TMM), 2017
3 January 2017
Naoya Takahashi
Michael Gygli
Luc Van Gool
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"AENet: Learning Deep Audio Features for Video Analysis"
37 / 37 papers shown
Optimising MFCC parameters for the automatic detection of respiratory diseases
Yuyang Yan
Sami O. Simons
L. V. Bemmel
Lauren Reinders
Frits M E Franssen
V. Urovi
219
0
0
14 Aug 2024
Onset and offset weighted loss function for sound event detection
Tao Song
230
0
0
20 Mar 2024
Zero- and Few-shot Sound Event Localization and Detection
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023
Kazuki Shimada
Kengo Uchida
Yuichiro Koyama
Takashi Shibuya
Shusuke Takahashi
Yuki Mitsufuji
Tatsuya Kawahara
305
16
0
17 Sep 2023
Machine Unlearning: Solutions and Challenges
IEEE Transactions on Emerging Topics in Computational Intelligence (TETCI), 2023
Jie Xu
Zihan Wu
Cong Wang
Xiaohua Jia
MU
531
122
0
14 Aug 2023
MAiVAR-T: Multimodal Audio-image and Video Action Recognizer using Transformers
European Workshop on Visual Information Processing (EUVIP), 2023
Muhammad Bilal Shaikh
Douglas Chai
Syed Mohammed Shamsul Islam
Naveed Akhtar
355
7
0
01 Aug 2023
Joint Moment Retrieval and Highlight Detection Via Natural Language Queries
Richard Luo
Austin Peng
Heidi Yap
Koby Beard
ViT
153
0
0
08 May 2023
Deep Learning Based Multimodal with Two-phase Training Strategy for Daily Life Video Classification
International Conference on Content-Based Multimedia Indexing (CBMI), 2023
L. D. Pham
T. Le
Cam Le
Dat Ngo
Axel Weissenfeld
Alexander Schindler
160
3
0
30 Apr 2023
VarietySound: Timbre-Controllable Video to Sound Generation via Unsupervised Information Disentanglement
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2022
Chenye Cui
Yi Ren
Jinglin Liu
Rongjie Huang
Zhou Zhao
VGen
250
19
0
19 Nov 2022
MAiVAR: Multimodal Audio-Image and Video Action Recognizer
Visual Communications and Image Processing (VCIP), 2022
Muhammad Bilal Shaikh
Douglas Chai
S. Islam
Naveed Akhtar
232
6
0
11 Sep 2022
Talking Head Generation Driven by Speech-Related Facial Action Units and Audio- Based on Multimodal Representation Fusion
Sen Chen
Zhilei Liu
Jiaxing Liu
Longbiao Wang
177
6
0
27 Apr 2022
Multi-ACCDOA: Localizing and Detecting Overlapping Sounds from the Same Class with Auxiliary Duplicating Permutation Invariant Training
Kazuki Shimada
Yuichiro Koyama
Shusuke Takahashi
Naoya Takahashi
E. Tsunoo
Yuki Mitsufuji
200
104
0
14 Oct 2021
Spatial Data Augmentation with Simulated Room Impulse Responses for Sound Event Localization and Detection
Yuichiro Koyama
Kazuhide Shigemi
Masafumi Takahashi
Kazuki Shimada
Naoya Takahashi
E. Tsunoo
Shusuke Takahashi
Yuki Mitsufuji
296
15
0
13 Oct 2021
Spatial mixup: Directional loudness modification as data augmentation for sound event localization and detection
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2021
Ricardo Falcón Pérez
Kazuki Shimada
Yuichiro Koyama
Shusuke Takahashi
Yuki Mitsufuji
253
6
0
12 Oct 2021
Imbalanced Big Data Oversampling: Taxonomy, Algorithms, Software, Guidelines and Future Directions
ACM Computing Surveys (CSUR), 2021
W. Sleeman
R. Kapoor
AI4TS
306
101
0
24 Jul 2021
FoleyGAN: Visually Guided Generative Adversarial Network-Based Synchronous Sound Generation in Silent Videos
IEEE transactions on multimedia (IEEE Trans. Multimedia), 2021
Sanchita Ghose
John J. Prevost
GAN
178
34
0
20 Jul 2021
Ensemble of ACCDOA- and EINV2-based Systems with D3Nets and Impulse Response Simulation for Sound Event Localization and Detection
Kazuki Shimada
Naoya Takahashi
Yuichiro Koyama
Shusuke Takahashi
E. Tsunoo
Masafumi Takahashi
Yuki Mitsufuji
163
28
0
21 Jun 2021
Deep Learning Frameworks Applied For Audio-Visual Scene Classification
L. D. Pham
Alexander Schindler
Mina Schütz
Jasmin Lampert
S. Schlarb
Ross King
131
9
0
12 Jun 2021
RelationTrack: Relation-aware Multiple Object Tracking with Decoupled Representation
IEEE transactions on multimedia (IEEE Trans. Multimedia), 2021
En Yu
Zhuoling Li
Shoudong Han
Hongwei Wang
VOT
248
179
0
10 May 2021
A Four-Stage Data Augmentation Approach to ResNet-Conformer Based Acoustic Modeling for Sound Event Localization and Detection
IEEE/ACM Transactions on Audio Speech and Language Processing (TASLP), 2021
Qing Wang
Jun Du
Hua-Xin Wu
Jia Pan
Feng Ma
Chin-Hui Lee
242
121
0
08 Jan 2021
Densely connected multidilated convolutional networks for dense prediction tasks
Computer Vision and Pattern Recognition (CVPR), 2020
Naoya Takahashi
Yuki Mitsufuji
3DV
260
75
0
21 Nov 2020
ACCDOA: Activity-Coupled Cartesian Direction of Arrival Representation for Sound Event Localization and Detection
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2020
Kazuki Shimada
Yuichiro Koyama
Naoya Takahashi
Shusuke Takahashi
Yuki Mitsufuji
379
127
0
29 Oct 2020
Sound Event Localization and Detection Using Activity-Coupled Cartesian DOA Vector and RD3net
Kazuki Shimada
Naoya Takahashi
Shusuke Takahashi
Yuki Mitsufuji
317
20
0
22 Jun 2020
EDropout: Energy-Based Dropout and Pruning of Deep Neural Networks
Hojjat Salehinejad
S. Valaee
416
52
0
07 Jun 2020
AutoFoley: Artificial Synthesis of Synchronized Sound Tracks for Silent Videos with Deep Learning
IEEE transactions on multimedia (TMM), 2020
Sanchita Ghose
John J. Prevost
VGen
257
54
0
21 Feb 2020
SummaryNet: A Multi-Stage Deep Learning Model for Automatic Video Summarisation
Ziyad Jappie
David Torpey
Turgay Celik
109
3
0
19 Feb 2020
Data augmentation approaches for improving animal audio classification
Ecological Informatics (EI), 2019
L. Nanni
Gianluca Maguolo
M. Paci
229
165
0
16 Dec 2019
Audiogmenter: a MATLAB Toolbox for Audio Data Augmentation
Applied Computing and Informatics (ACI), 2019
Gianluca Maguolo
M. Paci
L. Nanni
Lu Bonan
203
21
0
11 Dec 2019
Improving Voice Separation by Incorporating End-to-end Speech Recognition
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2019
Naoya Takahashi
M. Singh
Sakya Basak
Sudarsanam Parthasaarathy
Sriram Ganapathy
Yuki Mitsufuji
VLM
200
19
0
29 Nov 2019
An End-to-End Audio Classification System based on Raw Waveforms and Mix-Training Strategy
Interspeech (Interspeech), 2019
Jiaxu Chen
Jing Hao
Kai Chen
Di Xie
Shicai Yang
Shiliang Pu
AI4TS
243
3
0
21 Nov 2019
Audio-Visual Model Distillation Using Acoustic Images
Andrés F. Pérez
Valentina Sanguineti
Pietro Morerio
Vittorio Murino
VLM
249
30
0
16 Apr 2019
Audio-Visual Scene-Aware Dialog
Huda AlAmri
Vincent Cartillier
Abhishek Das
Jue Wang
A. Cherian
...
Tim K. Marks
Chiori Hori
Peter Anderson
Stefan Lee
Devi Parikh
VGen
408
219
0
25 Jan 2019
Cross-domain Deep Feature Combination for Bird Species Classification with Audio-visual Data
B. Naranchimeg
Chao Zhang
T. Akashi
110
18
0
26 Nov 2018
Listening for Sirens: Locating and Classifying Acoustic Alarms in City Scenes
Letizia Marchegiani
Paul Newman
171
49
0
11 Oct 2018
MMDenseLSTM: An efficient combination of convolutional and recurrent neural networks for audio source separation
Naoya Takahashi
Nabarun Goswami
Yuki Mitsufuji
241
150
0
07 May 2018
Speech-Driven Facial Reenactment Using Conditional Generative Adversarial Networks
S. Jalalifar
Hosein Hasani
H. Aghajan
CVBM
GAN
166
26
0
20 Mar 2018
Multi-scale Multi-band DenseNets for Audio Source Separation
Naoya Takahashi
Yuki Mitsufuji
223
162
0
29 Jun 2017
Characterizing Types of Convolution in Deep Convolutional Recurrent Neural Networks for Robust Speech Emotion Recognition
Che-Wei Huang
Shrikanth. S. Narayanan
HAI
166
28
0
07 Jun 2017
1
Page 1 of 1