Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
1908.04917
Cited By
v1
v2 (latest)
A Cascade Sequence-to-Sequence Model for Chinese Mandarin Lip Reading
ACM Multimedia Asia (MMAsia), 2019
14 August 2019
Ya Zhao
Rui Xu
Xiuming Zhang
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"A Cascade Sequence-to-Sequence Model for Chinese Mandarin Lip Reading"
26 / 26 papers shown
AvatarSync: Rethinking Talking-Head Animation through Phoneme-Guided Autoregressive Perspective
Yuchen Deng
Xiuyang Wu
Hai-Tao Zheng
Suiyang Zhang
Yi He
Yuxing Han
VGen
182
0
0
15 Sep 2025
Learning Speaker-Invariant Visual Features for Lipreading
Yu Li
Feng Xue
S. Li
J. Zhang
Shuang Yang
D. Guo
Richang Hong
196
0
0
09 Jun 2025
ViCocktail: Automated Multi-Modal Data Collection for Vietnamese Audio-Visual Speech Recognition
Thai-Binh Nguyen
T. Nguyen
Quoc Truong Do
Chi Mai Luong
193
0
0
05 Jun 2025
Chinese-LiPS: A Chinese audio-visual speech recognition dataset with Lip-reading and Presentation Slides
Jinghua Zhao
Yuhang Jia
Shiyao Wang
Jiaming Zhou
Hui Wang
Yong Qin
330
3
0
21 Apr 2025
STSA: Spatial-Temporal Semantic Alignment for Visual Dubbing
Zijun Ding
Mingdie Xiong
Congcong Zhu
Jingrun Chen
DiffM
373
0
0
29 Mar 2025
RAL:Redundancy-Aware Lipreading Model Based on Differential Learning with Symmetric Views
Zejun gu
Junxia jiang
306
0
0
09 Sep 2024
A Large-scale Universal Evaluation Benchmark For Face Forgery Detection
Yijun Bei
Hengrui Lou
Jinsong Geng
Erteng Liu
Lechao Cheng
Jie Song
Mingli Song
Zunlei Feng
CVBM
441
3
0
13 Jun 2024
Landmark-Guided Cross-Speaker Lip Reading with Mutual Information Regularization
Linzhi Wu
Xingyu Zhang
Yakun Zhang
Changyan Zheng
Tiejun Liu
Liang Xie
Ye Yan
Erwei Yin
197
5
0
24 Mar 2024
JEP-KD: Joint-Embedding Predictive Architecture Based Knowledge Distillation for Visual Speech Recognition
Chang Sun
Hong Yang
Bo Qin
VLM
184
4
0
04 Mar 2024
MCF-VC: Mitigate Catastrophic Forgetting in Class-Incremental Learning for Multimodal Video Captioning
Huiyu Xiong
Lanxiao Wang
Heqian Qiu
Taijin Zhao
Benliu Qiu
Hongliang Li
CLL
243
1
0
27 Feb 2024
AnnoTheia: A Semi-Automatic Annotation Toolkit for Audio-Visual Speech Technologies
José-M. Acosta-Triana
David Gimeno-Gómez
Carlos David Martínez Hinarejos
VLM
VGen
306
4
0
20 Feb 2024
Lip Reading for Low-resource Languages by Learning and Combining General Speech Knowledge and Language-specific Knowledge
IEEE International Conference on Computer Vision (ICCV), 2023
Minsu Kim
Jeong Hun Yeo
J. Choi
Y. Ro
234
31
0
18 Aug 2023
Improving Audio-Visual Speech Recognition by Lip-Subword Correlation Based Visual Pre-training and Cross-Modal Fusion Encoder
IEEE International Conference on Multimedia and Expo (ICME), 2023
Yusheng Dai
Hang Chen
Jun Du
xiao-ying Ding
Ning Ding
Feijun Jiang
Chin-Hui Lee
298
10
0
14 Aug 2023
Automated Speaker Independent Visual Speech Recognition: A Comprehensive Survey
Image and Vision Computing (IVC), 2023
Praneeth Nemani
G. S. Krishna
Kundrapu Supriya
BDL
179
5
0
14 Jun 2023
Learning Cross-lingual Visual Speech Representations
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023
Andreas Zinonos
A. Haliassos
Pingchuan Ma
Stavros Petridis
Maja Pantic
SSL
177
10
0
14 Mar 2023
MixSpeech: Cross-Modality Self-Learning with Audio-Visual Stream Mixup for Visual Speech Translation and Recognition
IEEE International Conference on Computer Vision (ICCV), 2023
Xize Cheng
Lin Li
Tao Jin
Rongjie Huang
Wang Lin
Zehan Wang
Huangdai Liu
Yejin Wang
Aoxiong Yin
Zhou Zhao
240
29
0
09 Mar 2023
MuAViC: A Multilingual Audio-Visual Corpus for Robust Speech Recognition and Robust Speech-to-Text Translation
Interspeech (Interspeech), 2023
Mohamed Anwar
Bowen Shi
Vedanuj Goswami
Wei-Ning Hsu
J. Pino
Changhan Wang
254
48
0
01 Mar 2023
LipFormer: Learning to Lipread Unseen Speakers based on Visual-Landmark Transformers
Feng Xue
Yu Li
Deyin Liu
Yincen Xie
Lin Wu
Richang Hong
198
28
0
04 Feb 2023
A Multi-Purpose Audio-Visual Corpus for Multi-Modal Persian Speech Recognition: the Arman-AV Dataset
Expert systems with applications (ESWA), 2023
J. Peymanfard
Samin Heydarian
Ali Lashini
Hossein Zeinali
Mohammad Reza Mohammadi
N. Mozayani
352
14
0
21 Jan 2023
Visual Speech Recognition for Multiple Languages in the Wild
Nature Machine Intelligence (Nat. Mach. Intell.), 2022
Pingchuan Ma
Stavros Petridis
Maja Pantic
VLM
429
199
0
26 Feb 2022
LipSound2: Self-Supervised Pre-Training for Lip-to-Speech Reconstruction and Lip Reading
Leyuan Qu
C. Weber
S. Wermter
276
34
0
09 Dec 2021
Advances and Challenges in Deep Lip Reading
Marzieh Oghbaie
Arian Sabaghi
Kooshan Hashemifard
Mohammad Akbari
VLM
167
17
0
15 Oct 2021
Perception Point: Identifying Critical Learning Periods in Speech for Bilingual Networks
Anuj Saraswat
Mehar Bhatia
Yaman Kumar Singla
Changyou Chen
R. Shah
167
0
0
13 Oct 2021
Synchronous Bidirectional Learning for Multilingual Lip Reading
Mingshuang Luo
Shuang Yang
Xilin Chen
Zitao Liu
Shiguang Shan
176
18
0
08 May 2020
Can We Read Speech Beyond the Lips? Rethinking RoI Selection for Deep Visual Speech Recognition
IEEE International Conference on Automatic Face & Gesture Recognition (FG), 2020
Yuanhang Zhang
Shuang Yang
Jingyun Xiao
Shiguang Shan
Xilin Chen
372
71
0
06 Mar 2020
Hearing Lips: Improving Lip Reading by Distilling Speech Recognizers
AAAI Conference on Artificial Intelligence (AAAI), 2019
Ya Zhao
Rui Xu
Xinchao Wang
Peng Hou
Haihong Tang
Xiuming Zhang
249
103
0
26 Nov 2019
1
Page 1 of 1