Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2005.08209
Cited By
Learning Individual Speaking Styles for Accurate Lip to Speech Synthesis
17 May 2020
Prajwal K R
Rudrabha Mukhopadhyay
Vinay P. Namboodiri
C. V. Jawahar
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Learning Individual Speaking Styles for Accurate Lip to Speech Synthesis"
50 / 67 papers shown
Title
FlowDubber: Movie Dubbing with LLM-based Semantic-aware Learning and Flow Matching based Voice Enhancing
Gaoxiang Cong
Liang-Sheng Li
Jiadong Pan
Zhedong Zhang
Amin Beheshti
Anton Van Den Hengel
Yuankai Qi
Qingming Huang
207
0
0
02 May 2025
AlignDiT: Multimodal Aligned Diffusion Transformer for Synchronized Speech Generation
J. Choi
Ji-Hoon Kim
Kim Sung-Bin
Tae-Hyun Oh
Joon Son Chung
DiffM
51
0
0
29 Apr 2025
Supervising 3D Talking Head Avatars with Analysis-by-Audio-Synthesis
Radek Daněček
Carolin Schmitt
Senya Polikovsky
Michael J. Black
38
0
0
18 Apr 2025
From Faces to Voices: Learning Hierarchical Representations for High-quality Video-to-Speech
Ji-Hoon Kim
Jeongsoo Choi
Jaehun Kim
Chaeyoung Jung
Joon Son Chung
CVBM
58
1
0
21 Mar 2025
Shushing! Let's Imagine an Authentic Speech from the Silent Video
Jiaxin Ye
Hongming Shan
DiffM
VGen
71
1
0
19 Mar 2025
DiVISe: Direct Visual-Input Speech Synthesis Preserving Speaker Characteristics And Intelligibility
Yifan Liu
Yu Fang
Zhouhan Lin
45
0
0
07 Mar 2025
NaturalL2S: End-to-End High-quality Multispeaker Lip-to-Speech Synthesis with Differential Digital Signal Processing
Yifan Liang
Fangkun Liu
Andong Li
Xiaodong Li
C. Zheng
49
1
0
17 Feb 2025
EmoDubber: Towards High Quality and Emotion Controllable Movie Dubbing
Gaoxiang Cong
Jiadong Pan
Liang-Sheng Li
Yuankai Qi
Yuxin Peng
Anton Van Den Hengel
Jian Yang
Qingming Huang
94
6
0
12 Dec 2024
USpeech: Ultrasound-Enhanced Speech with Minimal Human Effort via Cross-Modal Synthesis
Luca Jiang-Tao Yu
Running Zhao
Sijie Ji
Edith C.H. Ngai
Chenshu Wu
33
0
0
29 Oct 2024
Self-Supervised Audio-Visual Soundscape Stylization
Tingle Li
Renhao Wang
Po-Yao Huang
Andrew Owens
Gopala Anumanchipalli
DiffM
SSL
38
4
0
22 Sep 2024
Towards Improving NAM-to-Speech Synthesis Intelligibility using Self-Supervised Speech Models
N. Shah
Shirish S. Karande
Vineet Gandhi
41
1
0
26 Jul 2024
DubWise: Video-Guided Speech Duration Control in Multimodal LLM-based Text-to-Speech for Dubbing
Neha Sahipjohn
Ashishkumar Gudmalwar
Nirmesh Shah
Pankaj Wasnik
R. Shah
43
5
0
13 Jun 2024
Separate in the Speech Chain: Cross-Modal Conditional Audio-Visual Target Speech Extraction
Zhaoxi Mu
Xinyu Yang
40
6
0
19 Apr 2024
Towards Accurate Lip-to-Speech Synthesis in-the-Wild
Sindhu B. Hegde
Rudrabha Mukhopadhyay
C. V. Jawahar
Vinay P. Namboodiri
27
4
0
02 Mar 2024
DiffusionTalker: Personalization and Acceleration for Speech-Driven 3D Face Diffuser
Peng Chen
Xiaobao Wei
Ming Lu
Yitong Zhu
Nai-Ming Yao
Xingyu Xiao
Hui Chen
34
12
0
28 Nov 2023
TRAVID: An End-to-End Video Translation Framework
Prottay Kumar Adhikary
Bandaru Sugandhi
Subhojit Ghimire
Santanu Pal
Partha Pakray
24
2
0
20 Sep 2023
Face-Driven Zero-Shot Voice Conversion with Memory-based Face-Voice Alignment
Zheng-Yan Sheng
Yang Ai
Yan-Nian Chen
Zhenhua Ling
CVBM
19
4
0
18 Sep 2023
Let There Be Sound: Reconstructing High Quality Speech from Silent Videos
Ji-Hoon Kim
Jaehun Kim
Joon Son Chung
37
5
0
29 Aug 2023
V2A-Mapper: A Lightweight Solution for Vision-to-Audio Generation by Connecting Foundation Models
Heng Wang
Jianbo Ma
Santiago Pascual
Richard Cartwright
Weidong (Tom) Cai
VGen
23
39
0
18 Aug 2023
DiffV2S: Diffusion-based Video-to-Speech Synthesis with Vision-guided Speaker Embedding
J. Choi
Joanna Hong
Y. Ro
DiffM
29
19
0
15 Aug 2023
Audio-visual video-to-speech synthesis with synthesized input audio
Triantafyllos Kefalas
Yannis Panagakis
M. Pantic
VGen
DiffM
38
1
0
31 Jul 2023
RobustL2S: Speaker-Specific Lip-to-Speech Synthesis exploiting Self-Supervised Representations
Neha Sahipjohn
Neil Shah
Vishal Tambrahalli
Vineet Gandhi
24
2
0
03 Jul 2023
High-Quality Automatic Voice Over with Accurate Alignment: Supervision through Self-Supervised Discrete Speech Units
Junchen Lu
Berrak Sisman
Mingyang Zhang
Haizhou Li
32
4
0
29 Jun 2023
Large-scale unsupervised audio pre-training for video-to-speech synthesis
Triantafyllos Kefalas
Yannis Panagakis
M. Pantic
VGen
40
3
0
27 Jun 2023
Hearing Lips in Noise: Universal Viseme-Phoneme Mapping and Transfer for Robust Audio-Visual Speech Recognition
Yuchen Hu
Ruizhe Li
Cheng Chen
Chengwei Qin
Qiu-shi Zhu
Eng Siong Chng
39
5
0
18 Jun 2023
LipVoicer: Generating Speech from Silent Videos Guided by Lip Reading
Yochai Yemini
Aviv Shamsian
Lior Bracha
Sharon Gannot
Ethan Fetaya
DiffM
33
11
0
05 Jun 2023
Intelligible Lip-to-Speech Synthesis with Speech Units
J. Choi
Minsu Kim
Y. Ro
32
24
0
31 May 2023
Zero-shot personalized lip-to-speech synthesis with face image based voice control
Zheng-Yan Sheng
Yang Ai
Zhenhua Ling
CVBM
27
5
0
09 May 2023
Conditional Generation of Audio from Video via Foley Analogies
Yuexi Du
Ziyang Chen
Justin Salamon
Bryan C. Russell
Andrew Owens
VGen
25
38
0
17 Apr 2023
A Complete Survey on Generative AI (AIGC): Is ChatGPT from GPT-4 to GPT-5 All You Need?
Chaoning Zhang
Chenshuang Zhang
Sheng Zheng
Yu Qiao
Chenghao Li
...
Lik-Hang Lee
Yang Yang
Heng Tao Shen
In So Kweon
Choong Seon Hong
85
160
0
21 Mar 2023
On the Audio-visual Synchronization for Lip-to-Speech Synthesis
Zhe Niu
Brian Mak
22
3
0
01 Mar 2023
Cross-modal Face- and Voice-style Transfer
Naoya Takahashi
M. Singh
Yuki Mitsufuji
CVBM
56
2
0
27 Feb 2023
Lip-to-Speech Synthesis in the Wild with Multi-task Learning
Minsu Kim
Joanna Hong
Y. Ro
22
21
0
17 Feb 2023
LipLearner: Customizable Silent Speech Interactions on Mobile Devices
Zixiong Su
Shitao Fang
Jun Rekimoto
18
26
0
12 Feb 2023
Multimodality Representation Learning: A Survey on Evolution, Pretraining and Its Applications
Muhammad Arslan Manzoor
S. Albarri
Ziting Xian
Zaiqiao Meng
Preslav Nakov
Shangsong Liang
AI4TS
42
26
0
01 Feb 2023
Learning to Dub Movies via Hierarchical Prosody Models
Gaoxiang Cong
Liang Li
Yuankai Qi
Zhengjun Zha
Qi Wu
Wen-yu Wang
Bin Jiang
Ming Yang
Qin Huang
75
25
0
08 Dec 2022
Vision+X: A Survey on Multimodal Learning in the Light of Data
Ye Zhu
Yuehua Wu
N. Sebe
Yan Yan
35
16
0
05 Oct 2022
Lip-to-Speech Synthesis for Arbitrary Speakers in the Wild
Sindhu B. Hegde
Prajwal K R
Rudrabha Mukhopadhyay
Vinay P. Namboodiri
C. V. Jawahar
45
10
0
01 Sep 2022
Dual-Path Cross-Modal Attention for better Audio-Visual Speech Extraction
Zhongweiyang Xu
Xulin Fan
M. Hasegawa-Johnson
19
2
0
09 Jul 2022
FastLTS: Non-Autoregressive End-to-End Unconstrained Lip-to-Speech Synthesis
Yongqiang Wang
Zhou Zhao
19
10
0
08 Jul 2022
Show Me Your Face, And I'll Tell You How You Speak
Christen Millerdurai
L. A. Khaliq
Timon Ulrich
CVBM
68
0
0
28 Jun 2022
VisageSynTalk: Unseen Speaker Video-to-Speech Synthesis via Speech-Visage Feature Selection
Joanna Hong
Minsu Kim
Y. Ro
CVBM
DiffM
36
8
0
15 Jun 2022
Lip-Listening: Mixing Senses to Understand Lips using Cross Modality Knowledge Distillation for Word-Based Models
Hadeel Mabrouk
Omar Abugabal
Nourhan Sakr
Hesham M. Eraqi
VLM
33
2
0
05 Jun 2022
Learning Speaker-specific Lip-to-Speech Generation
Munender Varshney
Ravindra Yadav
Vinay P. Namboodiri
R. Hegde
26
7
0
04 Jun 2022
Learning Visual Styles from Audio-Visual Associations
Tingle Li
Yichen Liu
Andrew Owens
Hang Zhao
DiffM
23
21
0
10 May 2022
SVTS: Scalable Video-to-Speech Synthesis
Rodrigo Mira
A. Haliassos
Stavros Petridis
Björn W. Schuller
M. Pantic
22
32
0
04 May 2022
Lip to Speech Synthesis with Visual Context Attentional GAN
Minsu Kim
Joanna Hong
Y. Ro
33
51
0
04 Apr 2022
Multi-modality Associative Bridging through Memory: Speech Sound Recollected from Face Video
Minsu Kim
Joanna Hong
Se Jin Park
Yong Man Ro
CVBM
25
40
0
04 Apr 2022
Audio-Visual Speech Codecs: Rethinking Audio-Visual Speech Enhancement by Re-Synthesis
Karren D. Yang
Dejan Marković
Steven Krenn
Vasu Agrawal
Alexander Richard
VGen
18
32
0
31 Mar 2022
Self-supervised Transformer for Deepfake Detection
Hanqing Zhao
Wenbo Zhou
Dongdong Chen
Weiming Zhang
Nenghai Yu
ViT
19
34
0
02 Mar 2022
1
2
Next