Multi-modality Associative Bridging through Memory: Speech Sound
Recollected from Face Video

Multi-modality Associative Bridging through Memory: Speech Sound Recollected from Face Video

4 April 2022

Papers citing "Multi-modality Associative Bridging through Memory: Speech Sound Recollected from Face Video"

18 / 18 papers shown

Title
NaturalL2S: End-to-End High-quality Multispeaker Lip-to-Speech Synthesis with Differential Digital Signal Processing Yifan Liang Fangkun Liu Andong Li Xiaodong Li C. Zheng 47 1 0 17 Feb 2025
SyncVSR: Data-Efficient Visual Speech Recognition with End-to-End Crossmodal Audio Token Synchronization Young Jin Ahn Jungwoo Park Sangha Park Jonghyun Choi Kee-Eung Kim 34 7 0 18 Jun 2024
Lip Reading for Low-resource Languages by Learning and Combining General Speech Knowledge and Language-specific Knowledge Minsu Kim Jeong Hun Yeo J. Choi Y. Ro 34 16 0 18 Aug 2023
DiffV2S: Diffusion-based Video-to-Speech Synthesis with Vision-guided Speaker Embedding J. Choi Joanna Hong Y. Ro DiffM 27 17 0 15 Aug 2023
AKVSR: Audio Knowledge Empowered Visual Speech Recognition by Compressing Audio Knowledge of a Pretrained Model Jeong Hun Yeo Minsu Kim J. Choi Dae Hoe Kim Y. Ro 24 18 0 15 Aug 2023
RobustL2S: Speaker-Specific Lip-to-Speech Synthesis exploiting Self-Supervised Representations Neha Sahipjohn Neil Shah Vishal Tambrahalli Vineet Gandhi 19 2 0 03 Jul 2023
Hearing Lips in Noise: Universal Viseme-Phoneme Mapping and Transfer for Robust Audio-Visual Speech Recognition Yuchen Hu Ruizhe Li Cheng Chen Chengwei Qin Qiu-shi Zhu E. Chng 29 5 0 18 Jun 2023
Incorporating Ultrasound Tongue Images for Audio-Visual Speech Enhancement through Knowledge Distillation Ruixin Zheng Yang Ai Zhenhua Ling 24 8 0 24 May 2023
On the Audio-visual Synchronization for Lip-to-Speech Synthesis Zhe Niu Brian Mak 13 3 0 01 Mar 2023
Deep Visual Forced Alignment: Learning to Align Transcription with Talking Face Video Minsu Kim Chae Won Kim Y. Ro CVBM DiffM 30 3 0 27 Feb 2023
Prompt Tuning of Deep Neural Networks for Speaker-adaptive Visual Speech Recognition Minsu Kim Hyungil Kim Y. Ro VLM 13 18 0 16 Feb 2023
SyncTalkFace: Talking Face Generation with Precise Lip-Syncing via Audio-Lip Memory Se Jin Park Minsu Kim Joanna Hong J. Choi Y. Ro CVBM 19 85 0 02 Nov 2022
Speaker-adaptive Lip Reading with User-dependent Padding Minsu Kim Hyunjun Kim Y. Ro 17 20 0 09 Aug 2022
Visual Context-driven Audio Feature Enhancement for Robust End-to-End Audio-Visual Speech Recognition Joanna Hong Minsu Kim Daehun Yoo Y. Ro 23 20 0 13 Jul 2022
VisageSynTalk: Unseen Speaker Video-to-Speech Synthesis via Speech-Visage Feature Selection Joanna Hong Minsu Kim Y. Ro CVBM DiffM 28 8 0 15 Jun 2022
Lip to Speech Synthesis with Visual Context Attentional GAN Minsu Kim Joanna Hong Y. Ro 15 50 0 04 Apr 2022
Distinguishing Homophenes Using Multi-Head Visual-Audio Memory for Lip Reading Minsu Kim Jeong Hun Yeo Yong Man Ro 13 61 0 04 Apr 2022
Lipreading using Temporal Convolutional Networks Brais Martínez Pingchuan Ma Stavros Petridis M. Pantic 168 238 0 23 Jan 2020