ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1701.00599
  4. Cited By
AENet: Learning Deep Audio Features for Video Analysis

AENet: Learning Deep Audio Features for Video Analysis

3 January 2017
Naoya Takahashi
Michael Gygli
Luc Van Gool
ArXivPDFHTML

Papers citing "AENet: Learning Deep Audio Features for Video Analysis"

20 / 20 papers shown
Title
MAiVAR-T: Multimodal Audio-image and Video Action Recognizer using
  Transformers
MAiVAR-T: Multimodal Audio-image and Video Action Recognizer using Transformers
Muhammad Bilal Shaikh
Douglas Chai
Syed Mohammed Shamsul Islam
Naveed Akhtar
30
5
0
01 Aug 2023
VarietySound: Timbre-Controllable Video to Sound Generation via
  Unsupervised Information Disentanglement
VarietySound: Timbre-Controllable Video to Sound Generation via Unsupervised Information Disentanglement
Chenye Cui
Yi Ren
Jinglin Liu
Rongjie Huang
Zhou Zhao
VGen
38
14
0
19 Nov 2022
MAiVAR: Multimodal Audio-Image and Video Action Recognizer
MAiVAR: Multimodal Audio-Image and Video Action Recognizer
Muhammad Bilal Shaikh
Douglas Chai
S. Islam
Naveed Akhtar
32
5
0
11 Sep 2022
Talking Head Generation Driven by Speech-Related Facial Action Units and
  Audio- Based on Multimodal Representation Fusion
Talking Head Generation Driven by Speech-Related Facial Action Units and Audio- Based on Multimodal Representation Fusion
Sen Chen
Zhilei Liu
Jiaxing Liu
Longbiao Wang
39
6
0
27 Apr 2022
Multi-ACCDOA: Localizing and Detecting Overlapping Sounds from the Same
  Class with Auxiliary Duplicating Permutation Invariant Training
Multi-ACCDOA: Localizing and Detecting Overlapping Sounds from the Same Class with Auxiliary Duplicating Permutation Invariant Training
Kazuki Shimada
Yuichiro Koyama
Shusuke Takahashi
Naoya Takahashi
E. Tsunoo
Yuki Mitsufuji
13
63
0
14 Oct 2021
Imbalanced Big Data Oversampling: Taxonomy, Algorithms, Software,
  Guidelines and Future Directions
Imbalanced Big Data Oversampling: Taxonomy, Algorithms, Software, Guidelines and Future Directions
W. Sleeman
R. Kapoor
AI4TS
17
71
0
24 Jul 2021
FoleyGAN: Visually Guided Generative Adversarial Network-Based
  Synchronous Sound Generation in Silent Videos
FoleyGAN: Visually Guided Generative Adversarial Network-Based Synchronous Sound Generation in Silent Videos
Sanchita Ghose
John J. Prevost
GAN
27
26
0
20 Jul 2021
Ensemble of ACCDOA- and EINV2-based Systems with D3Nets and Impulse
  Response Simulation for Sound Event Localization and Detection
Ensemble of ACCDOA- and EINV2-based Systems with D3Nets and Impulse Response Simulation for Sound Event Localization and Detection
Kazuki Shimada
Naoya Takahashi
Yuichiro Koyama
Shusuke Takahashi
E. Tsunoo
Masafumi Takahashi
Yuki Mitsufuji
30
23
0
21 Jun 2021
RelationTrack: Relation-aware Multiple Object Tracking with Decoupled
  Representation
RelationTrack: Relation-aware Multiple Object Tracking with Decoupled Representation
En Yu
Zhuoling Li
Shoudong Han
Hongwei Wang
VOT
53
128
0
10 May 2021
A Four-Stage Data Augmentation Approach to ResNet-Conformer Based
  Acoustic Modeling for Sound Event Localization and Detection
A Four-Stage Data Augmentation Approach to ResNet-Conformer Based Acoustic Modeling for Sound Event Localization and Detection
Qing Wang
Jun Du
Hua-Xin Wu
Jia Pan
Feng Ma
Chin-Hui Lee
13
79
0
08 Jan 2021
Densely connected multidilated convolutional networks for dense
  prediction tasks
Densely connected multidilated convolutional networks for dense prediction tasks
Naoya Takahashi
Yuki Mitsufuji
3DV
21
64
0
21 Nov 2020
ACCDOA: Activity-Coupled Cartesian Direction of Arrival Representation
  for Sound Event Localization and Detection
ACCDOA: Activity-Coupled Cartesian Direction of Arrival Representation for Sound Event Localization and Detection
Kazuki Shimada
Yuichiro Koyama
Naoya Takahashi
Shusuke Takahashi
Yuki Mitsufuji
23
86
0
29 Oct 2020
Sound Event Localization and Detection Using Activity-Coupled Cartesian
  DOA Vector and RD3net
Sound Event Localization and Detection Using Activity-Coupled Cartesian DOA Vector and RD3net
Kazuki Shimada
Naoya Takahashi
Shusuke Takahashi
Yuki Mitsufuji
16
19
0
22 Jun 2020
AutoFoley: Artificial Synthesis of Synchronized Sound Tracks for Silent
  Videos with Deep Learning
AutoFoley: Artificial Synthesis of Synchronized Sound Tracks for Silent Videos with Deep Learning
Sanchita Ghose
John J. Prevost
VGen
14
46
0
21 Feb 2020
Audio-Visual Model Distillation Using Acoustic Images
Audio-Visual Model Distillation Using Acoustic Images
Andrés F. Pérez
Valentina Sanguineti
Pietro Morerio
Vittorio Murino
VLM
15
27
0
16 Apr 2019
Audio-Visual Scene-Aware Dialog
Audio-Visual Scene-Aware Dialog
Huda AlAmri
Vincent Cartillier
Abhishek Das
Jue Wang
A. Cherian
...
Tim K. Marks
Chiori Hori
Peter Anderson
Stefan Lee
Devi Parikh
VGen
27
189
0
25 Jan 2019
Listening for Sirens: Locating and Classifying Acoustic Alarms in City
  Scenes
Listening for Sirens: Locating and Classifying Acoustic Alarms in City Scenes
Letizia Marchegiani
Paul Newman
15
35
0
11 Oct 2018
MMDenseLSTM: An efficient combination of convolutional and recurrent
  neural networks for audio source separation
MMDenseLSTM: An efficient combination of convolutional and recurrent neural networks for audio source separation
Naoya Takahashi
Nabarun Goswami
Yuki Mitsufuji
39
141
0
07 May 2018
Characterizing Types of Convolution in Deep Convolutional Recurrent
  Neural Networks for Robust Speech Emotion Recognition
Characterizing Types of Convolution in Deep Convolutional Recurrent Neural Networks for Robust Speech Emotion Recognition
Che-Wei Huang
Shrikanth. S. Narayanan
HAI
27
25
0
07 Jun 2017
Improving neural networks by preventing co-adaptation of feature
  detectors
Improving neural networks by preventing co-adaptation of feature detectors
Geoffrey E. Hinton
Nitish Srivastava
A. Krizhevsky
Ilya Sutskever
Ruslan Salakhutdinov
VLM
266
7,639
0
03 Jul 2012
1