Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2303.05309
Cited By
MixSpeech: Cross-Modality Self-Learning with Audio-Visual Stream Mixup for Visual Speech Translation and Recognition
9 March 2023
Xize Cheng
Lin Li
Tao Jin
Rongjie Huang
Wang Lin
Zehan Wang
Huangdai Liu
Yejin Wang
Aoxiong Yin
Zhou Zhao
Re-assign community
ArXiv
PDF
HTML
Papers citing
"MixSpeech: Cross-Modality Self-Learning with Audio-Visual Stream Mixup for Visual Speech Translation and Recognition"
17 / 17 papers shown
Title
Gesture-Aware Zero-Shot Speech Recognition for Patients with Language Disorders
Seungbae Kim
Daeun Lee
Brielle Stark
Jinyoung Han
31
0
0
18 Feb 2025
Improving Lip-synchrony in Direct Audio-Visual Speech-to-Speech Translation
Lucas Goncalves
Prashant Mathur
Xing Niu
Brady Houston
Chandrashekhar Lavania
Srikanth Vishnubhotla
Lijia Sun
Anthony Ferritto
67
0
0
21 Dec 2024
SegTalker: Segmentation-based Talking Face Generation with Mask-guided Local Editing
Lingyu Xiong
Xize Cheng
Jintao Tan
Xianjia Wu
Xiandong Li
Lei Zhu
Fei Ma
Minglei Li
Huang Xu
Zhihu Hu
24
3
0
05 Sep 2024
Landmark-guided Diffusion Model for High-fidelity and Temporally Coherent Talking Head Generation
Jintao Tan
Xize Cheng
Lingyu Xiong
Lei Zhu
Xiandong Li
Xianjia Wu
Kai Gong
Minglei Li
Yi Cai
DiffM
21
2
0
03 Aug 2024
XLAVS-R: Cross-Lingual Audio-Visual Speech Representation Learning for Noise-Robust Speech Perception
HyoJung Han
Mohamed Anwar
J. Pino
Wei-Ning Hsu
Marine Carpuat
Bowen Shi
Changhan Wang
VLM
27
9
0
21 Mar 2024
Where Visual Speech Meets Language: VSP-LLM Framework for Efficient and Context-Aware Visual Speech Processing
Jeong Hun Yeo
Seunghee Han
Minsu Kim
Y. Ro
43
11
0
23 Feb 2024
TransFace: Unit-Based Audio-Visual Speech Synthesizer for Talking Head Translation
Xize Cheng
Rongjie Huang
Linjun Li
Tao Jin
Zehan Wang
Aoxiong Yin
Minglei Li
Xinyu Duan
Changpeng Yang
Zhou Zhao
28
2
0
23 Dec 2023
Language Model is a Branch Predictor for Simultaneous Machine Translation
Aoxiong Yin
Tianyun Zhong
Haoyuan Li
Siliang Tang
Zhou Zhao
14
1
0
22 Dec 2023
AV2AV: Direct Audio-Visual Speech to Audio-Visual Speech Translation with Unified Audio-Visual Speech Representation
J. Choi
Se Jin Park
Minsu Kim
Y. Ro
17
12
0
05 Dec 2023
Intuitive Multilingual Audio-Visual Speech Recognition with a Single-Trained Model
Joanna Hong
Se Jin Park
Y. Ro
VLM
9
6
0
23 Oct 2023
OpenSR: Open-Modality Speech Recognition via Maintaining Multi-Modality Alignment
Xize Cheng
Tao Jin
Lin Li
Wang Lin
Xinyu Duan
Zhou Zhao
VLM
6
15
0
10 Jun 2023
Wav2SQL: Direct Generalizable Speech-To-SQL Parsing
Huadai Liu
Rongjie Huang
Jinzheng He
Gang Sun
Ran Shen
Xize Cheng
Zhou Zhao
25
3
0
21 May 2023
Leveraging Pseudo-labeled Data to Improve Direct Speech-to-Speech Translation
Qianqian Dong
Fengpeng Yue
Tom Ko
Mingxuan Wang
Qibing Bai
Yu Zhang
27
16
0
18 May 2022
Talking Face Generation with Multilingual TTS
Hyoung-Kyu Song
Sanghyun Woo
Junhyeok Lee
S. Yang
Hyunjae Cho
Youseong Lee
Dongho Choi
Kang-Wook Kim
CVBM
32
21
0
13 May 2022
Geodesic Multi-Modal Mixup for Robust Fine-Tuning
Changdae Oh
Junhyuk So
Hoyoon Byun
Yongtaek Lim
Minchul Shin
Jong-June Jeon
Kyungwoo Song
21
26
0
08 Mar 2022
VoxCeleb2: Deep Speaker Recognition
Joon Son Chung
Arsha Nagrani
Andrew Zisserman
214
2,224
0
14 Jun 2018
Lip Reading Sentences in the Wild
Joon Son Chung
A. Senior
Oriol Vinyals
Andrew Zisserman
162
783
0
16 Nov 2016
1