Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2412.08988
Cited By
v1
v2
v3 (latest)
EmoDubber: Towards High Quality and Emotion Controllable Movie Dubbing
Computer Vision and Pattern Recognition (CVPR), 2024
12 December 2024
Gaoxiang Cong
Jiadong Pan
Liang-Sheng Li
Yuankai Qi
Yuxin Peng
Anton Van Den Hengel
Jian Yang
Qingming Huang
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"EmoDubber: Towards High Quality and Emotion Controllable Movie Dubbing"
50 / 53 papers shown
Title
EmoVerse: A MLLMs-Driven Emotion Representation Dataset for Interpretable Visual Emotion Analysis
Yijie Guo
Dexiang Hong
Weidong Chen
Zihan She
Cheng Ye
Xiaojun Chang
Zhendong Mao
80
0
0
16 Nov 2025
IndexTTS2: A Breakthrough in Emotionally Expressive and Duration-Controlled Auto-Regressive Zero-Shot Text-to-Speech
Siyi Zhou
Yiquan Zhou
Yi He
Xun Zhou
Jinchao Wang
Wei Deng
Jingchen Shu
DiffM
139
14
0
23 Jun 2025
Towards Film-Making Production Dialogue, Narration, Monologue Adaptive Moving Dubbing Benchmarks
Chaoyi Wang
Junjie Zheng
Zihao Chen
Shiyu Xia
Chaofan Ding
Xiaohao Zhang
Xi Tao
Xiaoming He
Xinhan Di
AuLLM
809
0
0
30 Apr 2025
DeepDubber-V1: Towards High Quality and Dialogue, Narration, Monologue Adaptive Movie Dubbing Via Multi-Modal Chain-of-Thoughts Reasoning Guidance
Junjie Zheng
Zihao Chen
Chaofan Ding
Xinhan Di
VGen
247
2
0
31 Mar 2025
Prosody-Enhanced Acoustic Pre-training and Acoustic-Disentangled Prosody Adapting for Movie Dubbing
Computer Vision and Pattern Recognition (CVPR), 2025
Zhedong Zhang
Liang-Sheng Li
C. Yan
Chunshan Liu
Anton Van Den Hengel
Yuankai Qi
274
4
0
15 Mar 2025
Generative AI for Cel-Animation: A Survey
Yunlong Tang
Junjia Guo
Pinxin Liu
Zhiyuan Wang
Hang Hua
...
Jing Bi
Mingqian Feng
Xuzhao Li
Zeliang Zhang
Chenliang Xu
VGen
591
17
0
08 Jan 2025
HELPNet: Hierarchical Perturbations Consistency and Entropy-guided Ensemble for Scribble Supervised Medical Image Segmentation
Xiao Zhang
Shaoxuan Wu
Peilin Zhang
Zhuo Jin
Xiaosong Xiong
Qirong Bu
Jingkun Chen
Jun Feng
222
2
0
25 Dec 2024
MaskGCT: Zero-Shot Text-to-Speech with Masked Generative Codec Transformer
International Conference on Learning Representations (ICLR), 2024
Yuancheng Wang
Haoyue Zhan
Liwei Liu
Ruihong Zeng
Haotian Guo
Jiachen Zheng
Qiang Zhang
Shunsi Zhang
Shunsi Zhang
Zhizheng Wu
344
136
0
01 Sep 2024
Synchronous Multi-modal Semantic Communication System with Packet-level Coding
IEEE Transactions on Wireless Communications (IEEE TWC), 2024
Yun Tian
Jingkai Ying
Zhijin Qin
Ye Jin
Xiaoming Tao
223
13
0
08 Aug 2024
Generating High-quality Symbolic Music Using Fine-grained Discriminators
International Conference on Pattern Recognition (ICPR), 2024
Zhedong Zhang
Liang-Sheng Li
Jiehua Zhang
Zhenghui Hu
Hongkui Wang
Chenggang Yan
Jian Yang
Yuankai Qi
198
5
0
03 Aug 2024
Distractors-Immune Representation Learning with Cross-modal Contrastive Regularization for Change Captioning
Yunbin Tu
Liang-Sheng Li
Li Su
Chenggang Yan
Qin Huang
208
14
0
16 Jul 2024
EmoBox: Multilingual Multi-corpus Speech Emotion Recognition Toolkit and Benchmark
Ziyang Ma
Mingjie Chen
Hezhao Zhang
Zhisheng Zheng
Wenxi Chen
Xiquan Li
Jiaxin Ye
Xie Chen
Thomas Hain
241
43
0
11 Jun 2024
Faces that Speak: Jointly Synthesising Talking Face and Speech from Text
Computer Vision and Pattern Recognition (CVPR), 2024
Youngjoon Jang
Ji-Hoon Kim
Junseok Ahn
Doyeop Kwak
Hong-Sun Yang
Yooncheol Ju
Il-Hwan Kim
Byeong-Yeol Kim
Joon Son Chung
CVBM
199
18
0
16 May 2024
Lumina-T2X: Transforming Text into Any Modality, Resolution, and Duration via Flow-based Large Diffusion Transformers
Shiyang Feng
Le Zhuo
Ziyi Lin
Ruoyi Du
Xu Luo
...
Weicai Ye
He Tong
Jingwen He
Yu Qiao
Jiaming Song
VGen
282
120
0
09 May 2024
Scaling Rectified Flow Transformers for High-Resolution Image Synthesis
Patrick Esser
Sumith Kulal
A. Blattmann
Rahim Entezari
Jonas Muller
...
Zion English
Kyle Lacey
Alex Goodwin
Yannik Marek
Robin Rombach
DiffM
1.5K
2,606
0
05 Mar 2024
StyleDubber: Towards Multi-Scale Style Learning for Movie Dubbing
Gaoxiang Cong
Yuankai Qi
Liang-Sheng Li
Amin Beheshti
Zhedong Zhang
Anton Van Den Hengel
Ming-Hsuan Yang
Chenggang Yan
Qingming Huang
262
21
0
20 Feb 2024
Context Disentangling and Prototype Inheriting for Robust Visual Grounding
Wei Tang
Liang Li
Xuejing Liu
Lu Jin
Jinhui Tang
Zechao Li
224
41
0
19 Dec 2023
Equivariant Flow Matching with Hybrid Probability Transport
Neural Information Processing Systems (NeurIPS), 2023
Yuxuan Song
Jingjing Gong
Minkai Xu
Ziyao Cao
Yanyan Lan
Stefano Ermon
Hao Zhou
Wei-Ying Ma
DiffM
244
80
0
12 Dec 2023
Self-supervised Cross-view Representation Reconstruction for Change Captioning
IEEE International Conference on Computer Vision (ICCV), 2023
Yunbin Tu
Liang Li
Filippos Christianos
Zheng-Jun Zha
Zhibin Li
Qingming Huang
SSL
161
36
0
28 Sep 2023
VoiceFlow: Efficient Text-to-Speech with Rectified Flow Matching
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023
Yiwei Guo
Chenpeng Du
Ziyang Ma
Xie Chen
K. Yu
DiffM
220
61
0
10 Sep 2023
Matcha-TTS: A fast TTS architecture with conditional flow matching
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023
Shivam Mehta
Ruibo Tu
Jonas Beskow
Éva Székely
G. Henter
252
169
0
06 Sep 2023
LipVoicer: Generating Speech from Silent Videos Guided by Lip Reading
International Conference on Learning Representations (ICLR), 2023
Yochai Yemini
Aviv Shamsian
Lior Bracha
Sharon Gannot
Ethan Fetaya
DiffM
286
21
0
05 Jun 2023
EmoMix: Emotion Mixing via Diffusion Models for Emotional Speech Synthesis
Interspeech (Interspeech), 2023
Haobin Tang
Xulong Zhang
Jianzong Wang
Ning Cheng
Jing Xiao
DiffM
218
39
0
01 Jun 2023
Seeing What You Said: Talking Face Generation Guided by a Lip Reading Expert
Computer Vision and Pattern Recognition (CVPR), 2023
Jiadong Wang
Xinyuan Qian
Malu Zhang
R. Tan
Haizhou Li
EGVM
167
134
0
29 Mar 2023
Imaginary Voice: Face-styled Diffusion Model for Text-to-Speech
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023
Jiyoung Lee
Joon Son Chung
Soo-Whan Chung
DiffM
162
44
0
27 Feb 2023
Deep Visual Forced Alignment: Learning to Align Transcription with Talking Face Video
AAAI Conference on Artificial Intelligence (AAAI), 2023
Minsu Kim
Chae Won Kim
Y. Ro
CVBM
DiffM
103
4
0
27 Feb 2023
Lip-to-Speech Synthesis in the Wild with Multi-task Learning
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023
Minsu Kim
Joanna Hong
Y. Ro
175
28
0
17 Feb 2023
Audio-Visual Efficient Conformer for Robust Speech Recognition
IEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2023
Maxime Burchi
Radu Timofte
VLM
143
48
0
04 Jan 2023
Learning to Dub Movies via Hierarchical Prosody Models
Computer Vision and Pattern Recognition (CVPR), 2022
Gaoxiang Cong
Liang Li
Yuankai Qi
Zhengjun Zha
Qi Wu
Wen-yu Wang
Bin Jiang
Ming-Hsuan Yang
Qin Huang
266
37
0
08 Dec 2022
Robust Speech Recognition via Large-Scale Weak Supervision
International Conference on Machine Learning (ICML), 2022
Alec Radford
Jong Wook Kim
Tao Xu
Greg Brockman
C. McLeavey
Ilya Sutskever
OffRL
908
5,540
0
06 Dec 2022
EmoDiff: Intensity Controllable Emotional Text-to-Speech with Soft-Label Guidance
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2022
Yiwei Guo
Chenpeng Du
Xie Chen
K. Yu
DiffM
203
55
0
17 Nov 2022
Flow Matching for Generative Modeling
International Conference on Learning Representations (ICLR), 2022
Y. Lipman
Ricky T. Q. Chen
Heli Ben-Hamu
Maximilian Nickel
Matt Le
OOD
919
2,703
0
06 Oct 2022
Speech Synthesis with Mixed Emotions
IEEE Transactions on Affective Computing (IEEE TAC), 2022
Kun Zhou
Berrak Sisman
R. Rana
B.W.Schuller
Haizhou Li
296
61
0
11 Aug 2022
Entity-enhanced Adaptive Reconstruction Network for Weakly Supervised Referring Expression Grounding
IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2022
Xuejing Liu
Liang Li
Shuhui Wang
Zhengjun Zha
Dechao Meng
Qi Tian
Qingming Huang
177
72
0
18 Jul 2022
BigVGAN: A Universal Neural Vocoder with Large-Scale Training
International Conference on Learning Representations (ICLR), 2022
Sang-gil Lee
Ming-Yu Liu
Boris Ginsburg
Bryan Catanzaro
Sung-Hoon Yoon
271
373
0
09 Jun 2022
Content-Dependent Fine-Grained Speaker Embedding for Zero-Shot Speaker Adaptation in Text-to-Speech Synthesis
Interspeech (Interspeech), 2022
Yixuan Zhou
Changhe Song
Xiang Li
Lu Zhang
Zhiyong Wu
Yanyao Bian
Jane Polak Scowcroft
Helen Meng
223
27
0
03 Apr 2022
V2C: Visual Voice Cloning
Qi Chen
Yuanqing Li
Yuankai Qi
Jiaqiu Zhou
Zhuliang Yu
Qi Wu
VGen
150
32
0
25 Nov 2021
More than Words: In-the-Wild Visually-Driven Prosody for Text-to-Speech
Computer Vision and Pattern Recognition (CVPR), 2021
Michael Hassid
Michelle Tadmor Ramanovich
Brendan Shillingford
Miaosen Wang
Ye Jia
Tal Remez
DiffM
184
21
0
19 Nov 2021
Neural Dubber: Dubbing for Videos According to Scripts
Chenxu Hu
Qiao Tian
Tingle Li
Yuping Wang
Yuxuan Wang
Hang Zhao
DiffM
VGen
226
51
0
15 Oct 2021
VisualTTS: TTS with Accurate Lip-Speech Synchronization for Automatic Voice Over
Junchen Lu
Berrak Sisman
Rui Liu
Mingyang Zhang
Haizhou Li
DiffM
218
29
0
07 Oct 2021
Meta-StyleSpeech : Multi-Speaker Adaptive Text-to-Speech Generation
International Conference on Machine Learning (ICML), 2021
Dong Min
Dong Bok Lee
Eunho Yang
Sung Ju Hwang
294
206
0
06 Jun 2021
Diffusion Models Beat GANs on Image Synthesis
Neural Information Processing Systems (NeurIPS), 2021
Prafulla Dhariwal
Alex Nichol
1.5K
10,090
0
11 May 2021
End-to-end Audio-visual Speech Recognition with Conformers
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2021
Pingchuan Ma
Stavros Petridis
Maja Pantic
256
280
0
12 Feb 2021
Maximum Likelihood Training of Score-Based Diffusion Models
Neural Information Processing Systems (NeurIPS), 2021
Yang Song
Conor Durkan
Iain Murray
Stefano Ermon
DiffM
669
787
0
22 Jan 2021
A Lip Sync Expert Is All You Need for Speech to Lip Generation In The Wild
Prajwal K R
Rudrabha Mukhopadhyay
Vinay P. Namboodiri
C. V. Jawahar
EGVM
383
988
0
23 Aug 2020
Towards Practical Lipreading with Distilled and Efficient Models
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2020
Pingchuan Ma
Brais Martínez
Stavros Petridis
Maja Pantic
223
107
0
13 Jul 2020
FastSpeech 2: Fast and High-Quality End-to-End Text to Speech
Yi Ren
Chenxu Hu
Xu Tan
Tao Qin
Sheng Zhao
Zhou Zhao
Tie-Yan Liu
510
1,618
0
08 Jun 2020
Glow-TTS: A Generative Flow for Text-to-Speech via Monotonic Alignment Search
Jaehyeon Kim
Sungwon Kim
Jungil Kong
Sungroh Yoon
232
566
0
22 May 2020
Learning Individual Speaking Styles for Accurate Lip to Speech Synthesis
Prajwal K R
Rudrabha Mukhopadhyay
Vinay P. Namboodiri
C. V. Jawahar
156
129
0
17 May 2020
Lipreading using Temporal Convolutional Networks
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2020
Brais Martínez
Pingchuan Ma
Stavros Petridis
Maja Pantic
374
281
0
23 Jan 2020
1
2
Next