ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2204.01265
  4. Cited By
Multi-modality Associative Bridging through Memory: Speech Sound
  Recollected from Face Video

Multi-modality Associative Bridging through Memory: Speech Sound Recollected from Face Video

4 April 2022
Minsu Kim
Joanna Hong
Se Jin Park
Yong Man Ro
    CVBM
ArXivPDFHTML

Papers citing "Multi-modality Associative Bridging through Memory: Speech Sound Recollected from Face Video"

18 / 18 papers shown
Title
NaturalL2S: End-to-End High-quality Multispeaker Lip-to-Speech Synthesis with Differential Digital Signal Processing
NaturalL2S: End-to-End High-quality Multispeaker Lip-to-Speech Synthesis with Differential Digital Signal Processing
Yifan Liang
Fangkun Liu
Andong Li
Xiaodong Li
C. Zheng
47
1
0
17 Feb 2025
SyncVSR: Data-Efficient Visual Speech Recognition with End-to-End
  Crossmodal Audio Token Synchronization
SyncVSR: Data-Efficient Visual Speech Recognition with End-to-End Crossmodal Audio Token Synchronization
Young Jin Ahn
Jungwoo Park
Sangha Park
Jonghyun Choi
Kee-Eung Kim
34
7
0
18 Jun 2024
Lip Reading for Low-resource Languages by Learning and Combining General
  Speech Knowledge and Language-specific Knowledge
Lip Reading for Low-resource Languages by Learning and Combining General Speech Knowledge and Language-specific Knowledge
Minsu Kim
Jeong Hun Yeo
J. Choi
Y. Ro
34
16
0
18 Aug 2023
DiffV2S: Diffusion-based Video-to-Speech Synthesis with Vision-guided
  Speaker Embedding
DiffV2S: Diffusion-based Video-to-Speech Synthesis with Vision-guided Speaker Embedding
J. Choi
Joanna Hong
Y. Ro
DiffM
27
17
0
15 Aug 2023
AKVSR: Audio Knowledge Empowered Visual Speech Recognition by
  Compressing Audio Knowledge of a Pretrained Model
AKVSR: Audio Knowledge Empowered Visual Speech Recognition by Compressing Audio Knowledge of a Pretrained Model
Jeong Hun Yeo
Minsu Kim
J. Choi
Dae Hoe Kim
Y. Ro
24
18
0
15 Aug 2023
RobustL2S: Speaker-Specific Lip-to-Speech Synthesis exploiting
  Self-Supervised Representations
RobustL2S: Speaker-Specific Lip-to-Speech Synthesis exploiting Self-Supervised Representations
Neha Sahipjohn
Neil Shah
Vishal Tambrahalli
Vineet Gandhi
19
2
0
03 Jul 2023
Hearing Lips in Noise: Universal Viseme-Phoneme Mapping and Transfer for
  Robust Audio-Visual Speech Recognition
Hearing Lips in Noise: Universal Viseme-Phoneme Mapping and Transfer for Robust Audio-Visual Speech Recognition
Yuchen Hu
Ruizhe Li
Cheng Chen
Chengwei Qin
Qiu-shi Zhu
E. Chng
29
5
0
18 Jun 2023
Incorporating Ultrasound Tongue Images for Audio-Visual Speech
  Enhancement through Knowledge Distillation
Incorporating Ultrasound Tongue Images for Audio-Visual Speech Enhancement through Knowledge Distillation
Ruixin Zheng
Yang Ai
Zhenhua Ling
24
8
0
24 May 2023
On the Audio-visual Synchronization for Lip-to-Speech Synthesis
On the Audio-visual Synchronization for Lip-to-Speech Synthesis
Zhe Niu
Brian Mak
13
3
0
01 Mar 2023
Deep Visual Forced Alignment: Learning to Align Transcription with
  Talking Face Video
Deep Visual Forced Alignment: Learning to Align Transcription with Talking Face Video
Minsu Kim
Chae Won Kim
Y. Ro
CVBM
DiffM
30
3
0
27 Feb 2023
Prompt Tuning of Deep Neural Networks for Speaker-adaptive Visual Speech
  Recognition
Prompt Tuning of Deep Neural Networks for Speaker-adaptive Visual Speech Recognition
Minsu Kim
Hyungil Kim
Y. Ro
VLM
13
18
0
16 Feb 2023
SyncTalkFace: Talking Face Generation with Precise Lip-Syncing via
  Audio-Lip Memory
SyncTalkFace: Talking Face Generation with Precise Lip-Syncing via Audio-Lip Memory
Se Jin Park
Minsu Kim
Joanna Hong
J. Choi
Y. Ro
CVBM
19
85
0
02 Nov 2022
Speaker-adaptive Lip Reading with User-dependent Padding
Speaker-adaptive Lip Reading with User-dependent Padding
Minsu Kim
Hyunjun Kim
Y. Ro
17
20
0
09 Aug 2022
Visual Context-driven Audio Feature Enhancement for Robust End-to-End
  Audio-Visual Speech Recognition
Visual Context-driven Audio Feature Enhancement for Robust End-to-End Audio-Visual Speech Recognition
Joanna Hong
Minsu Kim
Daehun Yoo
Y. Ro
23
20
0
13 Jul 2022
VisageSynTalk: Unseen Speaker Video-to-Speech Synthesis via
  Speech-Visage Feature Selection
VisageSynTalk: Unseen Speaker Video-to-Speech Synthesis via Speech-Visage Feature Selection
Joanna Hong
Minsu Kim
Y. Ro
CVBM
DiffM
28
8
0
15 Jun 2022
Lip to Speech Synthesis with Visual Context Attentional GAN
Lip to Speech Synthesis with Visual Context Attentional GAN
Minsu Kim
Joanna Hong
Y. Ro
15
50
0
04 Apr 2022
Distinguishing Homophenes Using Multi-Head Visual-Audio Memory for Lip
  Reading
Distinguishing Homophenes Using Multi-Head Visual-Audio Memory for Lip Reading
Minsu Kim
Jeong Hun Yeo
Yong Man Ro
13
61
0
04 Apr 2022
Lipreading using Temporal Convolutional Networks
Lipreading using Temporal Convolutional Networks
Brais Martínez
Pingchuan Ma
Stavros Petridis
M. Pantic
168
238
0
23 Jan 2020
1