ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1701.00599
  4. Cited By
AENet: Learning Deep Audio Features for Video Analysis
v1v2 (latest)

AENet: Learning Deep Audio Features for Video Analysis

IEEE transactions on multimedia (IEEE TMM), 2017
3 January 2017
Naoya Takahashi
Michael Gygli
Luc Van Gool
ArXiv (abs)PDFHTML

Papers citing "AENet: Learning Deep Audio Features for Video Analysis"

37 / 37 papers shown
Optimising MFCC parameters for the automatic detection of respiratory
  diseases
Optimising MFCC parameters for the automatic detection of respiratory diseases
Yuyang Yan
Sami O. Simons
L. V. Bemmel
Lauren Reinders
Frits M E Franssen
V. Urovi
219
0
0
14 Aug 2024
Onset and offset weighted loss function for sound event detection
Onset and offset weighted loss function for sound event detection
Tao Song
230
0
0
20 Mar 2024
Zero- and Few-shot Sound Event Localization and Detection
Zero- and Few-shot Sound Event Localization and DetectionIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023
Kazuki Shimada
Kengo Uchida
Yuichiro Koyama
Takashi Shibuya
Shusuke Takahashi
Yuki Mitsufuji
Tatsuya Kawahara
305
16
0
17 Sep 2023
Machine Unlearning: Solutions and Challenges
Machine Unlearning: Solutions and ChallengesIEEE Transactions on Emerging Topics in Computational Intelligence (TETCI), 2023
Jie Xu
Zihan Wu
Cong Wang
Xiaohua Jia
MU
531
122
0
14 Aug 2023
MAiVAR-T: Multimodal Audio-image and Video Action Recognizer using
  Transformers
MAiVAR-T: Multimodal Audio-image and Video Action Recognizer using TransformersEuropean Workshop on Visual Information Processing (EUVIP), 2023
Muhammad Bilal Shaikh
Douglas Chai
Syed Mohammed Shamsul Islam
Naveed Akhtar
355
7
0
01 Aug 2023
Joint Moment Retrieval and Highlight Detection Via Natural Language
  Queries
Joint Moment Retrieval and Highlight Detection Via Natural Language Queries
Richard Luo
Austin Peng
Heidi Yap
Koby Beard
ViT
153
0
0
08 May 2023
Deep Learning Based Multimodal with Two-phase Training Strategy for
  Daily Life Video Classification
Deep Learning Based Multimodal with Two-phase Training Strategy for Daily Life Video ClassificationInternational Conference on Content-Based Multimedia Indexing (CBMI), 2023
L. D. Pham
T. Le
Cam Le
Dat Ngo
Axel Weissenfeld
Alexander Schindler
160
3
0
30 Apr 2023
VarietySound: Timbre-Controllable Video to Sound Generation via
  Unsupervised Information Disentanglement
VarietySound: Timbre-Controllable Video to Sound Generation via Unsupervised Information DisentanglementIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2022
Chenye Cui
Yi Ren
Jinglin Liu
Rongjie Huang
Zhou Zhao
VGen
250
19
0
19 Nov 2022
MAiVAR: Multimodal Audio-Image and Video Action Recognizer
MAiVAR: Multimodal Audio-Image and Video Action RecognizerVisual Communications and Image Processing (VCIP), 2022
Muhammad Bilal Shaikh
Douglas Chai
S. Islam
Naveed Akhtar
232
6
0
11 Sep 2022
Talking Head Generation Driven by Speech-Related Facial Action Units and
  Audio- Based on Multimodal Representation Fusion
Talking Head Generation Driven by Speech-Related Facial Action Units and Audio- Based on Multimodal Representation Fusion
Sen Chen
Zhilei Liu
Jiaxing Liu
Longbiao Wang
177
6
0
27 Apr 2022
Multi-ACCDOA: Localizing and Detecting Overlapping Sounds from the Same
  Class with Auxiliary Duplicating Permutation Invariant Training
Multi-ACCDOA: Localizing and Detecting Overlapping Sounds from the Same Class with Auxiliary Duplicating Permutation Invariant Training
Kazuki Shimada
Yuichiro Koyama
Shusuke Takahashi
Naoya Takahashi
E. Tsunoo
Yuki Mitsufuji
200
104
0
14 Oct 2021
Spatial Data Augmentation with Simulated Room Impulse Responses for
  Sound Event Localization and Detection
Spatial Data Augmentation with Simulated Room Impulse Responses for Sound Event Localization and Detection
Yuichiro Koyama
Kazuhide Shigemi
Masafumi Takahashi
Kazuki Shimada
Naoya Takahashi
E. Tsunoo
Shusuke Takahashi
Yuki Mitsufuji
296
15
0
13 Oct 2021
Spatial mixup: Directional loudness modification as data augmentation
  for sound event localization and detection
Spatial mixup: Directional loudness modification as data augmentation for sound event localization and detectionIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2021
Ricardo Falcón Pérez
Kazuki Shimada
Yuichiro Koyama
Shusuke Takahashi
Yuki Mitsufuji
253
6
0
12 Oct 2021
Imbalanced Big Data Oversampling: Taxonomy, Algorithms, Software,
  Guidelines and Future Directions
Imbalanced Big Data Oversampling: Taxonomy, Algorithms, Software, Guidelines and Future DirectionsACM Computing Surveys (CSUR), 2021
W. Sleeman
R. Kapoor
AI4TS
306
101
0
24 Jul 2021
FoleyGAN: Visually Guided Generative Adversarial Network-Based
  Synchronous Sound Generation in Silent Videos
FoleyGAN: Visually Guided Generative Adversarial Network-Based Synchronous Sound Generation in Silent VideosIEEE transactions on multimedia (IEEE Trans. Multimedia), 2021
Sanchita Ghose
John J. Prevost
GAN
178
34
0
20 Jul 2021
Ensemble of ACCDOA- and EINV2-based Systems with D3Nets and Impulse
  Response Simulation for Sound Event Localization and Detection
Ensemble of ACCDOA- and EINV2-based Systems with D3Nets and Impulse Response Simulation for Sound Event Localization and Detection
Kazuki Shimada
Naoya Takahashi
Yuichiro Koyama
Shusuke Takahashi
E. Tsunoo
Masafumi Takahashi
Yuki Mitsufuji
163
28
0
21 Jun 2021
Deep Learning Frameworks Applied For Audio-Visual Scene Classification
Deep Learning Frameworks Applied For Audio-Visual Scene Classification
L. D. Pham
Alexander Schindler
Mina Schütz
Jasmin Lampert
S. Schlarb
Ross King
131
9
0
12 Jun 2021
RelationTrack: Relation-aware Multiple Object Tracking with Decoupled
  Representation
RelationTrack: Relation-aware Multiple Object Tracking with Decoupled RepresentationIEEE transactions on multimedia (IEEE Trans. Multimedia), 2021
En Yu
Zhuoling Li
Shoudong Han
Hongwei Wang
VOT
248
179
0
10 May 2021
A Four-Stage Data Augmentation Approach to ResNet-Conformer Based
  Acoustic Modeling for Sound Event Localization and Detection
A Four-Stage Data Augmentation Approach to ResNet-Conformer Based Acoustic Modeling for Sound Event Localization and DetectionIEEE/ACM Transactions on Audio Speech and Language Processing (TASLP), 2021
Qing Wang
Jun Du
Hua-Xin Wu
Jia Pan
Feng Ma
Chin-Hui Lee
242
121
0
08 Jan 2021
Densely connected multidilated convolutional networks for dense
  prediction tasks
Densely connected multidilated convolutional networks for dense prediction tasksComputer Vision and Pattern Recognition (CVPR), 2020
Naoya Takahashi
Yuki Mitsufuji
3DV
260
75
0
21 Nov 2020
ACCDOA: Activity-Coupled Cartesian Direction of Arrival Representation
  for Sound Event Localization and Detection
ACCDOA: Activity-Coupled Cartesian Direction of Arrival Representation for Sound Event Localization and DetectionIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2020
Kazuki Shimada
Yuichiro Koyama
Naoya Takahashi
Shusuke Takahashi
Yuki Mitsufuji
379
127
0
29 Oct 2020
Sound Event Localization and Detection Using Activity-Coupled Cartesian
  DOA Vector and RD3net
Sound Event Localization and Detection Using Activity-Coupled Cartesian DOA Vector and RD3net
Kazuki Shimada
Naoya Takahashi
Shusuke Takahashi
Yuki Mitsufuji
317
20
0
22 Jun 2020
EDropout: Energy-Based Dropout and Pruning of Deep Neural Networks
EDropout: Energy-Based Dropout and Pruning of Deep Neural Networks
Hojjat Salehinejad
S. Valaee
416
52
0
07 Jun 2020
AutoFoley: Artificial Synthesis of Synchronized Sound Tracks for Silent
  Videos with Deep Learning
AutoFoley: Artificial Synthesis of Synchronized Sound Tracks for Silent Videos with Deep LearningIEEE transactions on multimedia (TMM), 2020
Sanchita Ghose
John J. Prevost
VGen
257
54
0
21 Feb 2020
SummaryNet: A Multi-Stage Deep Learning Model for Automatic Video
  Summarisation
SummaryNet: A Multi-Stage Deep Learning Model for Automatic Video Summarisation
Ziyad Jappie
David Torpey
Turgay Celik
109
3
0
19 Feb 2020
Data augmentation approaches for improving animal audio classification
Data augmentation approaches for improving animal audio classificationEcological Informatics (EI), 2019
L. Nanni
Gianluca Maguolo
M. Paci
229
165
0
16 Dec 2019
Audiogmenter: a MATLAB Toolbox for Audio Data Augmentation
Audiogmenter: a MATLAB Toolbox for Audio Data AugmentationApplied Computing and Informatics (ACI), 2019
Gianluca Maguolo
M. Paci
L. Nanni
Lu Bonan
203
21
0
11 Dec 2019
Improving Voice Separation by Incorporating End-to-end Speech
  Recognition
Improving Voice Separation by Incorporating End-to-end Speech RecognitionIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2019
Naoya Takahashi
M. Singh
Sakya Basak
Sudarsanam Parthasaarathy
Sriram Ganapathy
Yuki Mitsufuji
VLM
200
19
0
29 Nov 2019
An End-to-End Audio Classification System based on Raw Waveforms and
  Mix-Training Strategy
An End-to-End Audio Classification System based on Raw Waveforms and Mix-Training StrategyInterspeech (Interspeech), 2019
Jiaxu Chen
Jing Hao
Kai Chen
Di Xie
Shicai Yang
Shiliang Pu
AI4TS
243
3
0
21 Nov 2019
Audio-Visual Model Distillation Using Acoustic Images
Audio-Visual Model Distillation Using Acoustic Images
Andrés F. Pérez
Valentina Sanguineti
Pietro Morerio
Vittorio Murino
VLM
249
30
0
16 Apr 2019
Audio-Visual Scene-Aware Dialog
Audio-Visual Scene-Aware Dialog
Huda AlAmri
Vincent Cartillier
Abhishek Das
Jue Wang
A. Cherian
...
Tim K. Marks
Chiori Hori
Peter Anderson
Stefan Lee
Devi Parikh
VGen
408
219
0
25 Jan 2019
Cross-domain Deep Feature Combination for Bird Species Classification
  with Audio-visual Data
Cross-domain Deep Feature Combination for Bird Species Classification with Audio-visual Data
B. Naranchimeg
Chao Zhang
T. Akashi
110
18
0
26 Nov 2018
Listening for Sirens: Locating and Classifying Acoustic Alarms in City
  Scenes
Listening for Sirens: Locating and Classifying Acoustic Alarms in City Scenes
Letizia Marchegiani
Paul Newman
171
49
0
11 Oct 2018
MMDenseLSTM: An efficient combination of convolutional and recurrent
  neural networks for audio source separation
MMDenseLSTM: An efficient combination of convolutional and recurrent neural networks for audio source separation
Naoya Takahashi
Nabarun Goswami
Yuki Mitsufuji
241
150
0
07 May 2018
Speech-Driven Facial Reenactment Using Conditional Generative
  Adversarial Networks
Speech-Driven Facial Reenactment Using Conditional Generative Adversarial Networks
S. Jalalifar
Hosein Hasani
H. Aghajan
CVBMGAN
166
26
0
20 Mar 2018
Multi-scale Multi-band DenseNets for Audio Source Separation
Multi-scale Multi-band DenseNets for Audio Source Separation
Naoya Takahashi
Yuki Mitsufuji
223
162
0
29 Jun 2017
Characterizing Types of Convolution in Deep Convolutional Recurrent
  Neural Networks for Robust Speech Emotion Recognition
Characterizing Types of Convolution in Deep Convolutional Recurrent Neural Networks for Robust Speech Emotion Recognition
Che-Wei Huang
Shrikanth. S. Narayanan
HAI
166
28
0
07 Jun 2017
1
Page 1 of 1