ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1709.00944
  4. Cited By
Audio-Visual Speech Enhancement Using Multimodal Deep Convolutional
  Neural Networks
v1v2v3v4v5 (latest)

Audio-Visual Speech Enhancement Using Multimodal Deep Convolutional Neural Networks

1 September 2017
Jen-Cheng Hou
Syu-Siang Wang
Ying-Hui Lai
Yu Tsao
Hsiu-Wen Chang
H. Wang
ArXiv (abs)PDFHTML

Papers citing "Audio-Visual Speech Enhancement Using Multimodal Deep Convolutional Neural Networks"

45 / 45 papers shown
Title
Diffusion-based Unsupervised Audio-visual Speech Enhancement
Diffusion-based Unsupervised Audio-visual Speech Enhancement
Jean-Eudes Ayilo
Mostafa Sadeghi
Romain Serizel
Xavier Alameda-Pineda
DiffM
112
1
0
04 Oct 2024
Audio-Visual Mandarin Electrolaryngeal Speech Voice Conversion
Audio-Visual Mandarin Electrolaryngeal Speech Voice Conversion
Yung-Lun Chien
Hsin-Hao Chen
Ming-Chi Yen
S. Tsai
Hsin-Min Wang
Yu Tsao
T. Chi
62
1
0
11 Jun 2023
Incorporating Ultrasound Tongue Images for Audio-Visual Speech
  Enhancement through Knowledge Distillation
Incorporating Ultrasound Tongue Images for Audio-Visual Speech Enhancement through Knowledge Distillation
Ruixin Zheng
Yang Ai
Zhenhua Ling
74
10
0
24 May 2023
LA-VocE: Low-SNR Audio-visual Speech Enhancement using Neural Vocoders
LA-VocE: Low-SNR Audio-visual Speech Enhancement using Neural Vocoders
Rodrigo Mira
Buye Xu
Jacob Donley
Anurag Kumar
Stavros Petridis
V. Ithapu
Maja Pantic
55
13
0
20 Nov 2022
AudioScopeV2: Audio-Visual Attention Architectures for Calibrated
  Open-Domain On-Screen Sound Separation
AudioScopeV2: Audio-Visual Attention Architectures for Calibrated Open-Domain On-Screen Sound Separation
Efthymios Tzinis
Scott Wisdom
Tal Remez
J. Hershey
119
30
0
20 Jul 2022
Improving Visual Speech Enhancement Network by Learning Audio-visual
  Affinity with Multi-head Attention
Improving Visual Speech Enhancement Network by Learning Audio-visual Affinity with Multi-head Attention
Xinmeng Xu
Yang Wang
Jie Jia
Binbin Chen
Dejun Li
50
10
0
30 Jun 2022
SoundSpaces 2.0: A Simulation Platform for Visual-Acoustic Learning
SoundSpaces 2.0: A Simulation Platform for Visual-Acoustic Learning
Changan Chen
Carl Schissler
Sanchit Garg
Philip Kobernik
Alexander Clegg
P. Calamia
Dhruv Batra
Philip Robinson
Kristen Grauman
3DGS
105
86
0
16 Jun 2022
EPG2S: Speech Generation and Speech Enhancement based on
  Electropalatography and Audio Signals using Multimodal Learning
EPG2S: Speech Generation and Speech Enhancement based on Electropalatography and Audio Signals using Multimodal Learning
Lichin Chen
Po-Hsun Chen
Richard Tzong-Han Tsai
Yu Tsao
56
8
0
16 Jun 2022
Few-Shot Audio-Visual Learning of Environment Acoustics
Few-Shot Audio-Visual Learning of Environment Acoustics
Sagnik Majumder
Changan Chen
Ziad Al-Halah
Kristen Grauman
88
55
0
08 Jun 2022
Expression-preserving face frontalization improves visually assisted
  speech processing
Expression-preserving face frontalization improves visually assisted speech processing
Zhiqi Kang
M. Sadeghi
Radu Horaud
Xavier Alameda-Pineda
CVBM
114
8
0
06 Apr 2022
VoViT: Low Latency Graph-based Audio-Visual Voice Separation Transformer
VoViT: Low Latency Graph-based Audio-Visual Voice Separation Transformer
Juan F. Montesinos
V. S. Kadandale
G. Haro
ViT
105
19
0
08 Mar 2022
SpeechPainter: Text-conditioned Speech Inpainting
SpeechPainter: Text-conditioned Speech Inpainting
Zalan Borsos
Matthew Sharifi
Marco Tagliasacchi
93
28
0
15 Feb 2022
Visual Acoustic Matching
Visual Acoustic Matching
Changan Chen
Ruohan Gao
P. Calamia
Kristen Grauman
77
58
0
14 Feb 2022
Towards Robust Real-time Audio-Visual Speech Enhancement
Towards Robust Real-time Audio-Visual Speech Enhancement
M. Gogate
K. Dashtipour
Amir Hussain
77
3
0
16 Dec 2021
FaVoA: Face-Voice Association Favours Ambiguous Speaker Detection
FaVoA: Face-Voice Association Favours Ambiguous Speaker Detection
Hugo C. C. Carneiro
C. Weber
S. Wermter
CVBM
70
7
0
01 Sep 2021
Look Who's Talking: Active Speaker Detection in the Wild
Look Who's Talking: Active Speaker Detection in the Wild
You Jin Kim
Hee-Soo Heo
Soyeon Choe
Soo-Whan Chung
Yoohwan Kwon
Bong-Jin Lee
Youngki Kwon
Joon Son Chung
113
21
0
17 Aug 2021
Multimodal Deep Learning Framework for Image Popularity Prediction on
  Social Media
Multimodal Deep Learning Framework for Image Popularity Prediction on Social Media
Fatma S. Abousaleh
Wen-Huang Cheng
Neng-Hao Yu
Yu Tsao
60
28
0
18 May 2021
Looking into Your Speech: Learning Cross-modal Affinity for Audio-visual
  Speech Separation
Looking into Your Speech: Learning Cross-modal Affinity for Audio-visual Speech Separation
Jiyoung Lee
Soo-Whan Chung
Sunok Kim
Hong-Goo Kang
Kwanghoon Sohn
59
51
0
25 Mar 2021
Audio-Visual Speech Separation Using Cross-Modal Correspondence Loss
Audio-Visual Speech Separation Using Cross-Modal Correspondence Loss
Naoki Makishima
Mana Ihori
Akihiko Takashima
Tomohiro Tanaka
Shota Orihashi
Ryo Masumura
54
8
0
02 Mar 2021
VisualVoice: Audio-Visual Speech Separation with Cross-Modal Consistency
VisualVoice: Audio-Visual Speech Separation with Cross-Modal Consistency
Ruohan Gao
Kristen Grauman
CVBM
247
202
0
08 Jan 2021
Audio-visual Speech Separation with Adversarially Disentangled Visual
  Representation
Audio-visual Speech Separation with Adversarially Disentangled Visual Representation
Peng Zhang
Jiaming Xu
Jing Shi
Yunzhe Hao
Bo Xu
377
5
0
29 Nov 2020
Into the Wild with AudioScope: Unsupervised Audio-Visual Separation of
  On-Screen Sounds
Into the Wild with AudioScope: Unsupervised Audio-Visual Separation of On-Screen Sounds
Efthymios Tzinis
Scott Wisdom
A. Jansen
Shawn Hershey
Tal Remez
D. Ellis
J. Hershey
83
71
0
02 Nov 2020
Listening to Sounds of Silence for Speech Denoising
Listening to Sounds of Silence for Speech Denoising
Ruilin Xu
Rundi Wu
Y. Ishiwaka
Carl Vondrick
Changxi Zheng
66
33
0
22 Oct 2020
Correlating Subword Articulation with Lip Shapes for Embedding Aware
  Audio-Visual Speech Enhancement
Correlating Subword Articulation with Lip Shapes for Embedding Aware Audio-Visual Speech Enhancement
Hang Chen
Jun Du
Yu Hu
Lirong Dai
Baocai Yin
Chin-Hui Lee
86
20
0
21 Sep 2020
SEANet: A Multi-modal Speech Enhancement Network
SEANet: A Multi-modal Speech Enhancement Network
Marco Tagliasacchi
Yunpeng Li
Karolis Misiunas
Dominik Roblek
82
73
0
04 Sep 2020
Improved Lite Audio-Visual Speech Enhancement
Improved Lite Audio-Visual Speech Enhancement
Shang-Yi Chuang
Hsin-Min Wang
Yu Tsao
94
34
0
30 Aug 2020
Incorporating Broad Phonetic Information for Speech Enhancement
Incorporating Broad Phonetic Information for Speech Enhancement
Yen-Ju Lu
Chien-Feng Liao
Xugang Lu
J. Hung
Yu Tsao
54
14
0
13 Aug 2020
Lite Audio-Visual Speech Enhancement
Lite Audio-Visual Speech Enhancement
Shang-Yi Chuang
Yu Tsao
Chen-Chou Lo
Hsin-Min Wang
125
26
0
24 May 2020
NAUTILUS: a Versatile Voice Cloning System
NAUTILUS: a Versatile Voice Cloning System
Hieu-Thi Luong
Junichi Yamagishi
93
53
0
22 May 2020
Discriminative Multi-modality Speech Recognition
Discriminative Multi-modality Speech Recognition
Bo Xu
Cheng Lu
Yandong Guo
Jacob Wang
91
99
0
12 May 2020
Artificial neural networks in action for an automated cell-type
  classification of biological neural networks
Artificial neural networks in action for an automated cell-type classification of biological neural networks
Eirini Troullinou
G. Tsagkatakis
Spyridon Chavlis
G. Turi
Wen-Ke Li
A. Losonczy
P. Tsakalides
Panayiota Poirazi
48
13
0
22 Nov 2019
Time-Domain Multi-modal Bone/air Conducted Speech Enhancement
Time-Domain Multi-modal Bone/air Conducted Speech Enhancement
Cheng Yu
Kuo-Hsuan Hung
Syu-Siang Wang
Szu-Wei Fu
Yu Tsao
J. Hung
74
35
0
22 Nov 2019
MMTM: Multimodal Transfer Module for CNN Fusion
MMTM: Multimodal Transfer Module for CNN Fusion
Hamid Reza Vaezi Joze
Amirreza Shaban
Michael L. Iuzzolino
K. Koishida
113
284
0
20 Nov 2019
CochleaNet: A Robust Language-independent Audio-Visual Model for Speech
  Enhancement
CochleaNet: A Robust Language-independent Audio-Visual Model for Speech Enhancement
M. Gogate
K. Dashtipour
Ahsan Adeel
Amir Hussain
50
53
0
23 Sep 2019
Stack-VS: Stacked Visual-Semantic Attention for Image Caption Generation
Stack-VS: Stacked Visual-Semantic Attention for Image Caption Generation
Wei Wei
Ling Cheng
Xian-Ling Mao
Guangyou Zhou
Feida Zhu
DiffM
79
19
0
05 Sep 2019
Audio-visual Speech Enhancement Using Conditional Variational
  Auto-Encoders
Audio-visual Speech Enhancement Using Conditional Variational Auto-Encoders
M. Sadeghi
Simon Leglaive
Xavier Alameda-Pineda
Laurent Girin
Radu Horaud
DiffM
108
66
0
07 Aug 2019
My lips are concealed: Audio-visual speech enhancement through
  obstructions
My lips are concealed: Audio-visual speech enhancement through obstructions
Triantafyllos Afouras
Joon Son Chung
Andrew Zisserman
65
91
0
11 Jul 2019
Deep-Learning-Based Audio-Visual Speech Enhancement in Presence of
  Lombard Effect
Deep-Learning-Based Audio-Visual Speech Enhancement in Presence of Lombard Effect
Daniel Michelsanti
Zheng-Hua Tan
S. Sigurðsson
Jesper Jensen
83
36
0
29 May 2019
LiSHT: Non-Parametric Linearly Scaled Hyperbolic Tangent Activation
  Function for Neural Networks
LiSHT: Non-Parametric Linearly Scaled Hyperbolic Tangent Activation Function for Neural Networks
Swalpa Kumar Roy
Suvojit Manna
S. Dubey
B. B. Chaudhuri
95
50
0
01 Jan 2019
Face Landmark-based Speaker-Independent Audio-Visual Speech Enhancement
  in Multi-Talker Environments
Face Landmark-based Speaker-Independent Audio-Visual Speech Enhancement in Multi-Talker Environments
Yufei Wang
Luca Pasa
Lantao Yu
Rohit Singh
Luciano Fadiga
L. Joppa
CVBM
69
60
0
06 Nov 2018
Contextual Audio-Visual Switching For Speech Enhancement in Real-World
  Environments
Contextual Audio-Visual Switching For Speech Enhancement in Real-World Environments
Ahsan Adeel
M. Gogate
Amir Hussain
61
52
0
28 Aug 2018
Lip-Reading Driven Deep Learning Approach for Speech Enhancement
Lip-Reading Driven Deep Learning Approach for Speech Enhancement
Ahsan Adeel
M. Gogate
Amir Hussain
W. Whitmer
79
65
0
31 Jul 2018
The Conversation: Deep Audio-Visual Speech Enhancement
The Conversation: Deep Audio-Visual Speech Enhancement
Triantafyllos Afouras
Joon Son Chung
Andrew Zisserman
93
360
0
11 Apr 2018
Audio-Visual Scene Analysis with Self-Supervised Multisensory Features
Audio-Visual Scene Analysis with Self-Supervised Multisensory Features
Andrew Owens
Alexei A. Efros
SSL
139
754
0
10 Apr 2018
The History Began from AlexNet: A Comprehensive Survey on Deep Learning
  Approaches
The History Began from AlexNet: A Comprehensive Survey on Deep Learning Approaches
Md. Zahangir Alom
T. Taha
C. Yakopcic
Stefan Westberg
P. Sidike
Mst Shamima Nasrin
B. Van Essen
A. Awwal
V. Asari
VLM
133
883
0
03 Mar 2018
1