Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
1908.02590
Cited By
v1
v2
v3 (latest)
Audio-visual Speech Enhancement Using Conditional Variational Auto-Encoders
IEEE/ACM Transactions on Audio Speech and Language Processing (TASLP), 2019
7 August 2019
M. Sadeghi
Simon Leglaive
Xavier Alameda-Pineda
Laurent Girin
Radu Horaud
DiffM
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Audio-visual Speech Enhancement Using Conditional Variational Auto-Encoders"
38 / 38 papers shown
Real-Time System for Audio-Visual Target Speech Enhancement
T. Aleksandra Ma
Sile Yin
Li-Chia Yang
Shuo Zhang
141
0
0
25 Sep 2025
End-to-end audio-visual learning for cochlear implant sound coding simulations in noisy environments
Meng-Ping Lin
Enoch Hsin-Ho Huang
Shao-Yi Chien
Yu Tsao
129
0
0
19 Aug 2025
Reading to Listen at the Cocktail Party: Multi-Modal Speech Separation
Computer Vision and Pattern Recognition (CVPR), 2022
Akam Rahimi
Triantafyllos Afouras
Andrew Zisserman
415
34
0
02 Jan 2025
A Survey of Recent Advances and Challenges in Deep Audio-Visual Correlation Learning
ACM Computing Surveys (ACM CSUR), 2024
Luis Vilaca
Yi Yu
Paula Vinan
533
3
0
24 Nov 2024
Diffusion-based Unsupervised Audio-visual Speech Enhancement
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2024
Jean-Eudes Ayilo
Mostafa Sadeghi
Romain Serizel
Xavier Alameda-Pineda
DiffM
404
10
0
04 Oct 2024
FlowAVSE: Efficient Audio-Visual Speech Enhancement with Conditional Flow Matching
Chaeyoung Jung
Suyeon Lee
Ji-Hoon Kim
Joon Son Chung
DiffM
287
24
0
13 Jun 2024
Missingness-resilient Video-enhanced Multimodal Disfluency Detection
Payal Mohapatra
Shamika Likhite
Subrata Biswas
Bashima Islam
Qi Zhu
274
7
0
11 Jun 2024
MeLFusion: Synthesizing Music from Image and Language Cues using Diffusion Models
Computer Vision and Pattern Recognition (CVPR), 2024
Sanjoy Chowdhury
Sayan Nag
K. J. Joseph
Balaji Vasan Srinivasan
Dinesh Manocha
DiffM
278
21
0
07 Jun 2024
Audio-Visual Speech Enhancement in Noisy Environments via Emotion-Based Contextual Cues
Tassadaq Hussain
K. Dashtipour
Yu Tsao
Amir Hussain
275
5
0
26 Feb 2024
Deep Complex U-Net with Conformer for Audio-Visual Speech Enhancement
Shafique Ahmed
Chia-Wei Chen
Wenze Ren
Chin-Jou Li
Ernie Chu
Jun-Cheng Chen
Amir Hussain
H. Wang
Yu Tsao
Jen-Cheng Hou
306
6
0
20 Sep 2023
AKVSR: Audio Knowledge Empowered Visual Speech Recognition by Compressing Audio Knowledge of a Pretrained Model
IEEE transactions on multimedia (IEEE TMM), 2023
Jeong Hun Yeo
Minsu Kim
J. Choi
Dae Hoe Kim
Y. Ro
254
27
0
15 Aug 2023
Audio-visual video-to-speech synthesis with synthesized input audio
Triantafyllos Kefalas
Yannis Panagakis
Maja Pantic
VGen
DiffM
304
1
0
31 Jul 2023
Audio-Visual Speech Enhancement with Score-Based Generative Models
Julius Richter
Simone Frintrop
Timo Gerkmann
DiffM
302
14
0
02 Jun 2023
Integrating Uncertainty into Neural Network-based Speech Enhancement
IEEE/ACM Transactions on Audio Speech and Language Processing (TASLP), 2023
Hu Fang
Dennis Becker
S. Wermter
Timo Gerkmann
UQCV
224
4
0
15 May 2023
Neural Target Speech Extraction: An Overview
IEEE Signal Processing Magazine (IEEE Signal Process. Mag.), 2023
Kateřina Žmolíková
Marc Delcroix
Tsubasa Ochiai
K. Kinoshita
JanHonza'' vCernocký
Dong Yu
238
146
0
31 Jan 2023
Multi-Label Training for Text-Independent Speaker Identification
Yuqi Xue
169
0
0
14 Nov 2022
Fast and efficient speech enhancement with variational autoencoders
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2022
M. Sadeghi
Romain Serizel
DRL
BDL
184
6
0
02 Nov 2022
A weighted-variance variational autoencoder model for speech enhancement
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2022
A. Golmakani
M. Sadeghi
Xavier Alameda-Pineda
Romain Serizel
273
2
0
02 Nov 2022
Audio-visual speech enhancement with a deep Kalman filter generative model
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2022
A. Golmakani
M. Sadeghi
Romain Serizel
DiffM
129
10
0
02 Nov 2022
Audio-Visual Speech Enhancement and Separation by Utilizing Multi-Modal Self-Supervised Embeddings
Ethan Chern
Kuo-Hsuan Hung
Yi-Ting Chen
Tassadaq Hussain
M. Gogate
Amir Hussain
Yu Tsao
Jen-Cheng Hou
SSL
365
20
0
31 Oct 2022
A survey of multimodal deep generative models
Masahiro Suzuki
Y. Matsuo
SyDa
DRL
223
117
0
05 Jul 2022
Few-Shot Audio-Visual Learning of Environment Acoustics
Neural Information Processing Systems (NeurIPS), 2022
Sagnik Majumder
Changan Chen
Ziad Al-Halah
Kristen Grauman
308
73
0
08 Jun 2022
Expression-preserving face frontalization improves visually assisted speech processing
International Journal of Computer Vision (IJCV), 2022
Zhiqi Kang
M. Sadeghi
Radu Horaud
Xavier Alameda-Pineda
CVBM
446
8
0
06 Apr 2022
Recent Advances and Challenges in Deep Audio-Visual Correlation Learning
Luís Vilacca
Yi Yu
Paula Viana
340
11
0
28 Feb 2022
Visual Acoustic Matching
Computer Vision and Pattern Recognition (CVPR), 2022
Changan Chen
Ruohan Gao
P. Calamia
Kristen Grauman
325
66
0
14 Feb 2022
The impact of removing head movements on audio-visual speech enhancement
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2022
Zhiqi Kang
M. Sadeghi
Radu Horaud
Xavier Alameda-Pineda
Jacob Donley
Anurag Kumar
CVBM
200
7
0
01 Feb 2022
A Novel Temporal Attentive-Pooling based Convolutional Recurrent Architecture for Acoustic Signal Enhancement
IEEE Transactions on Artificial Intelligence (IEEE TAI), 2022
Tassadaq Hussain
Wei-Chien Wang
M. Gogate
K. Dashtipour
Yu Tsao
Xugang Lu
A. Ahsan
Amir Hussain
163
5
0
24 Jan 2022
Unsupervised Speech Enhancement using Dynamical Variational Auto-Encoders
Xiaoyu Bie
Simon Leglaive
Xavier Alameda-Pineda
Laurent Girin
DiffM
343
62
0
23 Jun 2021
Variational Structured Attention Networks for Deep Visual Representation Learning
IEEE Transactions on Image Processing (TIP), 2021
Guanglei Yang
Paolo Rota
Xavier Alameda-Pineda
Dan Xu
M. Ding
Elisa Ricci
3DPC
204
6
0
05 Mar 2021
Switching Variational Auto-Encoders for Noise-Agnostic Audio-visual Speech Enhancement
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2021
M. Sadeghi
Xavier Alameda-Pineda
106
12
0
08 Feb 2021
Face Frontalization Based on Robustly Fitting a Deformable Shape Model to 3D Landmarks
Zhiqi Kang
M. Sadeghi
Radu Horaud
3DH
CVBM
243
4
0
26 Oct 2020
Improved Lite Audio-Visual Speech Enhancement
IEEE/ACM Transactions on Audio Speech and Language Processing (TASLP), 2020
Shang-Yi Chuang
Hsin-Min Wang
Yu Tsao
385
44
0
30 Aug 2020
Deep Variational Generative Models for Audio-visual Speech Separation
V. Nguyen
M. Sadeghi
Elisa Ricci
Xavier Alameda-Pineda
SSL
DRL
244
11
0
17 Aug 2020
SINVAD: Search-based Image Space Navigation for DNN Image Classifier Test Input Generation
Sungmin Kang
R. Feldt
S. Yoo
AAML
231
45
0
19 May 2020
Speaker Re-identification with Speaker Dependent Speech Enhancement
Yanpei Shi
Qiang Huang
Thomas Hain
233
5
0
15 May 2020
Robust Speaker Recognition Using Speech Enhancement And Attention Model
The Speaker and Language Recognition Workshop (Odyssey), 2020
Yanpei Shi
Qiang Huang
Thomas Hain
300
28
0
14 Jan 2020
Mixture of Inference Networks for VAE-based Audio-visual Speech Enhancement
IEEE Transactions on Signal Processing (IEEE Trans. Signal Process.), 2019
M. Sadeghi
Xavier Alameda-Pineda
318
25
0
23 Dec 2019
Robust Unsupervised Audio-visual Speech Enhancement Using a Mixture of Variational Autoencoders
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2019
M. Sadeghi
Xavier Alameda-Pineda
204
20
0
10 Nov 2019
1
Page 1 of 1