ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1908.02590
  4. Cited By
Audio-visual Speech Enhancement Using Conditional Variational
  Auto-Encoders
v1v2v3 (latest)

Audio-visual Speech Enhancement Using Conditional Variational Auto-Encoders

IEEE/ACM Transactions on Audio Speech and Language Processing (TASLP), 2019
7 August 2019
M. Sadeghi
Simon Leglaive
Xavier Alameda-Pineda
Laurent Girin
Radu Horaud
    DiffM
ArXiv (abs)PDFHTML

Papers citing "Audio-visual Speech Enhancement Using Conditional Variational Auto-Encoders"

38 / 38 papers shown
Real-Time System for Audio-Visual Target Speech Enhancement
Real-Time System for Audio-Visual Target Speech Enhancement
T. Aleksandra Ma
Sile Yin
Li-Chia Yang
Shuo Zhang
141
0
0
25 Sep 2025
End-to-end audio-visual learning for cochlear implant sound coding simulations in noisy environments
End-to-end audio-visual learning for cochlear implant sound coding simulations in noisy environments
Meng-Ping Lin
Enoch Hsin-Ho Huang
Shao-Yi Chien
Yu Tsao
129
0
0
19 Aug 2025
Reading to Listen at the Cocktail Party: Multi-Modal Speech Separation
Reading to Listen at the Cocktail Party: Multi-Modal Speech SeparationComputer Vision and Pattern Recognition (CVPR), 2022
Akam Rahimi
Triantafyllos Afouras
Andrew Zisserman
415
34
0
02 Jan 2025
A Survey of Recent Advances and Challenges in Deep Audio-Visual Correlation LearningACM Computing Surveys (ACM CSUR), 2024
Luis Vilaca
Yi Yu
Paula Vinan
533
3
0
24 Nov 2024
Diffusion-based Unsupervised Audio-visual Speech Enhancement
Diffusion-based Unsupervised Audio-visual Speech EnhancementIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2024
Jean-Eudes Ayilo
Mostafa Sadeghi
Romain Serizel
Xavier Alameda-Pineda
DiffM
404
10
0
04 Oct 2024
FlowAVSE: Efficient Audio-Visual Speech Enhancement with Conditional
  Flow Matching
FlowAVSE: Efficient Audio-Visual Speech Enhancement with Conditional Flow Matching
Chaeyoung Jung
Suyeon Lee
Ji-Hoon Kim
Joon Son Chung
DiffM
287
24
0
13 Jun 2024
Missingness-resilient Video-enhanced Multimodal Disfluency Detection
Missingness-resilient Video-enhanced Multimodal Disfluency Detection
Payal Mohapatra
Shamika Likhite
Subrata Biswas
Bashima Islam
Qi Zhu
274
7
0
11 Jun 2024
MeLFusion: Synthesizing Music from Image and Language Cues using
  Diffusion Models
MeLFusion: Synthesizing Music from Image and Language Cues using Diffusion ModelsComputer Vision and Pattern Recognition (CVPR), 2024
Sanjoy Chowdhury
Sayan Nag
K. J. Joseph
Balaji Vasan Srinivasan
Dinesh Manocha
DiffM
278
21
0
07 Jun 2024
Audio-Visual Speech Enhancement in Noisy Environments via Emotion-Based
  Contextual Cues
Audio-Visual Speech Enhancement in Noisy Environments via Emotion-Based Contextual Cues
Tassadaq Hussain
K. Dashtipour
Yu Tsao
Amir Hussain
275
5
0
26 Feb 2024
Deep Complex U-Net with Conformer for Audio-Visual Speech Enhancement
Deep Complex U-Net with Conformer for Audio-Visual Speech Enhancement
Shafique Ahmed
Chia-Wei Chen
Wenze Ren
Chin-Jou Li
Ernie Chu
Jun-Cheng Chen
Amir Hussain
H. Wang
Yu Tsao
Jen-Cheng Hou
306
6
0
20 Sep 2023
AKVSR: Audio Knowledge Empowered Visual Speech Recognition by
  Compressing Audio Knowledge of a Pretrained Model
AKVSR: Audio Knowledge Empowered Visual Speech Recognition by Compressing Audio Knowledge of a Pretrained ModelIEEE transactions on multimedia (IEEE TMM), 2023
Jeong Hun Yeo
Minsu Kim
J. Choi
Dae Hoe Kim
Y. Ro
254
27
0
15 Aug 2023
Audio-visual video-to-speech synthesis with synthesized input audio
Audio-visual video-to-speech synthesis with synthesized input audio
Triantafyllos Kefalas
Yannis Panagakis
Maja Pantic
VGenDiffM
304
1
0
31 Jul 2023
Audio-Visual Speech Enhancement with Score-Based Generative Models
Audio-Visual Speech Enhancement with Score-Based Generative Models
Julius Richter
Simone Frintrop
Timo Gerkmann
DiffM
302
14
0
02 Jun 2023
Integrating Uncertainty into Neural Network-based Speech Enhancement
Integrating Uncertainty into Neural Network-based Speech EnhancementIEEE/ACM Transactions on Audio Speech and Language Processing (TASLP), 2023
Hu Fang
Dennis Becker
S. Wermter
Timo Gerkmann
UQCV
224
4
0
15 May 2023
Neural Target Speech Extraction: An Overview
Neural Target Speech Extraction: An OverviewIEEE Signal Processing Magazine (IEEE Signal Process. Mag.), 2023
Kateřina Žmolíková
Marc Delcroix
Tsubasa Ochiai
K. Kinoshita
JanHonza'' vCernocký
Dong Yu
238
146
0
31 Jan 2023
Multi-Label Training for Text-Independent Speaker Identification
Multi-Label Training for Text-Independent Speaker Identification
Yuqi Xue
169
0
0
14 Nov 2022
Fast and efficient speech enhancement with variational autoencoders
Fast and efficient speech enhancement with variational autoencodersIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2022
M. Sadeghi
Romain Serizel
DRLBDL
184
6
0
02 Nov 2022
A weighted-variance variational autoencoder model for speech enhancement
A weighted-variance variational autoencoder model for speech enhancementIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2022
A. Golmakani
M. Sadeghi
Xavier Alameda-Pineda
Romain Serizel
273
2
0
02 Nov 2022
Audio-visual speech enhancement with a deep Kalman filter generative
  model
Audio-visual speech enhancement with a deep Kalman filter generative modelIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2022
A. Golmakani
M. Sadeghi
Romain Serizel
DiffM
129
10
0
02 Nov 2022
Audio-Visual Speech Enhancement and Separation by Utilizing Multi-Modal
  Self-Supervised Embeddings
Audio-Visual Speech Enhancement and Separation by Utilizing Multi-Modal Self-Supervised Embeddings
Ethan Chern
Kuo-Hsuan Hung
Yi-Ting Chen
Tassadaq Hussain
M. Gogate
Amir Hussain
Yu Tsao
Jen-Cheng Hou
SSL
365
20
0
31 Oct 2022
A survey of multimodal deep generative models
A survey of multimodal deep generative models
Masahiro Suzuki
Y. Matsuo
SyDaDRL
223
117
0
05 Jul 2022
Few-Shot Audio-Visual Learning of Environment Acoustics
Few-Shot Audio-Visual Learning of Environment AcousticsNeural Information Processing Systems (NeurIPS), 2022
Sagnik Majumder
Changan Chen
Ziad Al-Halah
Kristen Grauman
308
73
0
08 Jun 2022
Expression-preserving face frontalization improves visually assisted
  speech processing
Expression-preserving face frontalization improves visually assisted speech processingInternational Journal of Computer Vision (IJCV), 2022
Zhiqi Kang
M. Sadeghi
Radu Horaud
Xavier Alameda-Pineda
CVBM
446
8
0
06 Apr 2022
Recent Advances and Challenges in Deep Audio-Visual Correlation Learning
Recent Advances and Challenges in Deep Audio-Visual Correlation Learning
Luís Vilacca
Yi Yu
Paula Viana
340
11
0
28 Feb 2022
Visual Acoustic Matching
Visual Acoustic MatchingComputer Vision and Pattern Recognition (CVPR), 2022
Changan Chen
Ruohan Gao
P. Calamia
Kristen Grauman
325
66
0
14 Feb 2022
The impact of removing head movements on audio-visual speech enhancement
The impact of removing head movements on audio-visual speech enhancementIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2022
Zhiqi Kang
M. Sadeghi
Radu Horaud
Xavier Alameda-Pineda
Jacob Donley
Anurag Kumar
CVBM
200
7
0
01 Feb 2022
A Novel Temporal Attentive-Pooling based Convolutional Recurrent
  Architecture for Acoustic Signal Enhancement
A Novel Temporal Attentive-Pooling based Convolutional Recurrent Architecture for Acoustic Signal EnhancementIEEE Transactions on Artificial Intelligence (IEEE TAI), 2022
Tassadaq Hussain
Wei-Chien Wang
M. Gogate
K. Dashtipour
Yu Tsao
Xugang Lu
A. Ahsan
Amir Hussain
163
5
0
24 Jan 2022
Unsupervised Speech Enhancement using Dynamical Variational
  Auto-Encoders
Unsupervised Speech Enhancement using Dynamical Variational Auto-Encoders
Xiaoyu Bie
Simon Leglaive
Xavier Alameda-Pineda
Laurent Girin
DiffM
343
62
0
23 Jun 2021
Variational Structured Attention Networks for Deep Visual Representation
  Learning
Variational Structured Attention Networks for Deep Visual Representation LearningIEEE Transactions on Image Processing (TIP), 2021
Guanglei Yang
Paolo Rota
Xavier Alameda-Pineda
Dan Xu
M. Ding
Elisa Ricci
3DPC
204
6
0
05 Mar 2021
Switching Variational Auto-Encoders for Noise-Agnostic Audio-visual
  Speech Enhancement
Switching Variational Auto-Encoders for Noise-Agnostic Audio-visual Speech EnhancementIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2021
M. Sadeghi
Xavier Alameda-Pineda
106
12
0
08 Feb 2021
Face Frontalization Based on Robustly Fitting a Deformable Shape Model
  to 3D Landmarks
Face Frontalization Based on Robustly Fitting a Deformable Shape Model to 3D Landmarks
Zhiqi Kang
M. Sadeghi
Radu Horaud
3DHCVBM
243
4
0
26 Oct 2020
Improved Lite Audio-Visual Speech Enhancement
Improved Lite Audio-Visual Speech EnhancementIEEE/ACM Transactions on Audio Speech and Language Processing (TASLP), 2020
Shang-Yi Chuang
Hsin-Min Wang
Yu Tsao
385
44
0
30 Aug 2020
Deep Variational Generative Models for Audio-visual Speech Separation
Deep Variational Generative Models for Audio-visual Speech Separation
V. Nguyen
M. Sadeghi
Elisa Ricci
Xavier Alameda-Pineda
SSLDRL
244
11
0
17 Aug 2020
SINVAD: Search-based Image Space Navigation for DNN Image Classifier
  Test Input Generation
SINVAD: Search-based Image Space Navigation for DNN Image Classifier Test Input Generation
Sungmin Kang
R. Feldt
S. Yoo
AAML
231
45
0
19 May 2020
Speaker Re-identification with Speaker Dependent Speech Enhancement
Speaker Re-identification with Speaker Dependent Speech Enhancement
Yanpei Shi
Qiang Huang
Thomas Hain
233
5
0
15 May 2020
Robust Speaker Recognition Using Speech Enhancement And Attention Model
Robust Speaker Recognition Using Speech Enhancement And Attention ModelThe Speaker and Language Recognition Workshop (Odyssey), 2020
Yanpei Shi
Qiang Huang
Thomas Hain
300
28
0
14 Jan 2020
Mixture of Inference Networks for VAE-based Audio-visual Speech
  Enhancement
Mixture of Inference Networks for VAE-based Audio-visual Speech EnhancementIEEE Transactions on Signal Processing (IEEE Trans. Signal Process.), 2019
M. Sadeghi
Xavier Alameda-Pineda
318
25
0
23 Dec 2019
Robust Unsupervised Audio-visual Speech Enhancement Using a Mixture of
  Variational Autoencoders
Robust Unsupervised Audio-visual Speech Enhancement Using a Mixture of Variational AutoencodersIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2019
M. Sadeghi
Xavier Alameda-Pineda
204
20
0
10 Nov 2019
1
Page 1 of 1