Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2312.02512
Cited By
AV2AV: Direct Audio-Visual Speech to Audio-Visual Speech Translation with Unified Audio-Visual Speech Representation
5 December 2023
J. Choi
Se Jin Park
Minsu Kim
Y. Ro
Re-assign community
ArXiv
PDF
HTML
Papers citing
"AV2AV: Direct Audio-Visual Speech to Audio-Visual Speech Translation with Unified Audio-Visual Speech Representation"
17 / 17 papers shown
Title
FlowDubber: Movie Dubbing with LLM-based Semantic-aware Learning and Flow Matching based Voice Enhancing
Gaoxiang Cong
Liang-Sheng Li
Jiadong Pan
Zhedong Zhang
Amin Beheshti
A. Hengel
Yuankai Qi
Qingming Huang
58
0
0
02 May 2025
AlignDiT: Multimodal Aligned Diffusion Transformer for Synchronized Speech Generation
J. Choi
Ji-Hoon Kim
Kim Sung-Bin
Tae-Hyun Oh
Joon Son Chung
DiffM
48
0
0
29 Apr 2025
Perceptually Accurate 3D Talking Head Generation: New Definitions, Speech-Mesh Representation, and Evaluation Metrics
Lee Chae-Yeon
Oh Hyun-Bin
Han EunGi
Kim Sung-Bin
Suekyeong Nam
Tae-Hyun Oh
EGVM
3DH
80
0
1
26 Mar 2025
MAVFlow: Preserving Paralinguistic Elements with Conditional Flow Matching for Zero-Shot AV2AV Multilingual Translation
Sungwoo Cho
J. Choi
Sungnyun Kim
Se-Young Yun
54
0
0
14 Mar 2025
Nexus-O: An Omni-Perceptive And -Interactive Model for Language, Audio, And Vision
Che Liu
Yingji Zhang
D. Zhang
Weijie Zhang
Chenggong Gong
...
André Freitas
Qifan Wang
Z. Xu
Rongjuncheng Zhang
Yong Dai
AuLLM
63
0
0
26 Feb 2025
Improving Lip-synchrony in Direct Audio-Visual Speech-to-Speech Translation
Lucas Goncalves
Prashant Mathur
Xing Niu
Brady Houston
Chandrashekhar Lavania
Srikanth Vishnubhotla
Lijia Sun
Anthony Ferritto
67
0
0
21 Dec 2024
XLAVS-R: Cross-Lingual Audio-Visual Speech Representation Learning for Noise-Robust Speech Perception
HyoJung Han
Mohamed Anwar
J. Pino
Wei-Ning Hsu
Marine Carpuat
Bowen Shi
Changhan Wang
VLM
27
9
0
21 Mar 2024
TMT: Tri-Modal Translation between Speech, Image, and Text by Processing Different Modalities as Different Languages
Minsu Kim
Jee-weon Jung
Hyeongseop Rha
Soumi Maiti
Siddhant Arora
Xuankai Chang
Shinji Watanabe
Y. Ro
20
6
0
25 Feb 2024
Watch or Listen: Robust Audio-Visual Speech Recognition with Visual Corruption Modeling and Reliability Scoring
Joanna Hong
Minsu Kim
J. Choi
Y. Ro
27
19
0
15 Mar 2023
Visual Speech Recognition for Multiple Languages in the Wild
Pingchuan Ma
Stavros Petridis
M. Pantic
VLM
112
144
0
26 Feb 2022
YourTTS: Towards Zero-Shot Multi-Speaker TTS and Zero-Shot Voice Conversion for everyone
Edresson Casanova
Julian Weber
C. Shulby
Arnaldo Cândido Júnior
Eren Golge
M. Ponti
171
372
0
04 Dec 2021
SpeechT5: Unified-Modal Encoder-Decoder Pre-Training for Spoken Language Processing
Junyi Ao
Rui Wang
Long Zhou
Chengyi Wang
Shuo Ren
...
Yu Zhang
Zhihua Wei
Yao Qian
Jinyu Li
Furu Wei
110
192
0
14 Oct 2021
End-to-end Audio-visual Speech Recognition with Conformers
Pingchuan Ma
Stavros Petridis
M. Pantic
79
221
0
12 Feb 2021
Generative Spoken Language Modeling from Raw Audio
Kushal Lakhotia
Evgeny Kharitonov
Wei-Ning Hsu
Yossi Adi
Adam Polyak
...
Tu Nguyen
Jade Copet
Alexei Baevski
A. Mohamed
Emmanuel Dupoux
AuLLM
174
336
0
01 Feb 2021
VoxCeleb2: Deep Speaker Recognition
Joon Son Chung
Arsha Nagrani
Andrew Zisserman
214
2,224
0
14 Jun 2018
Transfer Learning from Speaker Verification to Multispeaker Text-To-Speech Synthesis
Ye Jia
Yu Zhang
Ron J. Weiss
Quan Wang
Jonathan Shen
...
Z. Chen
Patrick Nguyen
Ruoming Pang
Ignacio López Moreno
Yonghui Wu
201
817
0
12 Jun 2018
Lip Reading Sentences in the Wild
Joon Son Chung
A. Senior
Oriol Vinyals
Andrew Zisserman
162
782
0
16 Nov 2016
1