Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2204.01725
Cited By
Distinguishing Homophenes Using Multi-Head Visual-Audio Memory for Lip Reading
AAAI Conference on Artificial Intelligence (AAAI), 2022
4 April 2022
Minsu Kim
Jeong Hun Yeo
Yong Man Ro
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Distinguishing Homophenes Using Multi-Head Visual-Audio Memory for Lip Reading"
40 / 40 papers shown
Two Heads Are Better Than One: Audio-Visual Speech Error Correction with Dual Hypotheses
S. Kim
Kangwook Jang
Sungwoo Cho
Joon Son Chung
Hoirin Kim
Se-Young Yun
102
0
0
15 Oct 2025
Towards Inclusive Communication: A Unified Framework for Generating Spoken Language from Sign, Lip, and Audio
Jeong Hun Yeo
Hyeongseop Rha
Sungjune Park
Junil Won
Y. Ro
180
0
0
28 Aug 2025
AD-AVSR: Asymmetric Dual-stream Enhancement for Robust Audio-Visual Speech Recognition
Junxiao Xue
Xiaozhen Liu
Xuecheng Wu
Xinyi Yin
Danlei Huang
Fei Yu
121
0
0
11 Aug 2025
InfoSyncNet: Information Synchronization Temporal Convolutional Network for Visual Speech Recognition
Junxiao Xue
Xiaozhen Liu
Xuecheng Wu
Fei Yu
Jun Wang
139
0
0
04 Aug 2025
MemoryTalker: Personalized Speech-Driven 3D Facial Animation via Audio-Guided Stylization
Hyung Kyu Kim
Sangmin Lee
Hak Gu Kim
145
1
0
28 Jul 2025
Phoneme-Level Visual Speech Recognition via Point-Visual Fusion and Language Model Reconstruction
Matthew Kit Khinn Teng
Haibo Zhang
Takeshi Saitoh
127
1
0
25 Jul 2025
MeMo: Attentional Momentum for Real-time Audio-visual Speaker Extraction under Impaired Visual Conditions
Junjie Li
Wenxuan Wu
Shuai Wang
Zexu Pan
Kong Aik Lee
Chao Yang
Haizhou Li
117
1
0
21 Jul 2025
TD3Net: A temporal densely connected multi-dilated convolutional network for lipreading
Journal of Visual Communication and Image Representation (JVCIR), 2025
B. Lee
Wooseok Shin
Sung Won Han
237
0
0
19 Jun 2025
CNVSRC 2024: The Second Chinese Continuous Visual Speech Recognition Challenge
Zehua Liu
Xiaolou Li
Chen Chen
Lantian Li
D. Wang
232
1
0
27 May 2025
SwinLip: An Efficient Visual Speech Encoder for Lip Reading Using Swin Transformer
Young-Hu Park
R.-H. Park
Hyung-Min Park
343
6
0
07 May 2025
MMS-LLaMA: Efficient LLM-based Audio-Visual Speech Recognition with Minimal Multimodal Speech Tokens
Annual Meeting of the Association for Computational Linguistics (ACL), 2025
Jeong Hun Yeo
Hyeongseop Rha
Se Jin Park
Y. Ro
408
11
0
14 Mar 2025
Lend a Hand: Semi Training-Free Cued Speech Recognition via MLLM-Driven Hand Modeling for Barrier-free Communication
Guanjie Huang
Danny Hin Kwok Tsang
Li Liu
178
1
0
11 Mar 2025
Personalized Lip Reading: Adapting to Your Unique Lip Movements with Vision and Language
AAAI Conference on Artificial Intelligence (AAAI), 2024
Jeong Hun Yeo
Chae Won Kim
Hyunjun Kim
Hyeongseop Rha
Seunghee Han
Wen-Huang Cheng
Y. Ro
438
5
0
03 Jan 2025
RAL:Redundancy-Aware Lipreading Model Based on Differential Learning with Symmetric Views
Zejun gu
Junxia jiang
264
0
0
09 Sep 2024
SyncVSR: Data-Efficient Visual Speech Recognition with End-to-End Crossmodal Audio Token Synchronization
Young Jin Ahn
Jungwoo Park
Sangha Park
Jonghyun Choi
Kee-Eung Kim
218
14
0
18 Jun 2024
JEP-KD: Joint-Embedding Predictive Architecture Based Knowledge Distillation for Visual Speech Recognition
Chang Sun
Hong Yang
Bo Qin
VLM
152
4
0
04 Mar 2024
Where Visual Speech Meets Language: VSP-LLM Framework for Efficient and Context-Aware Visual Speech Processing
Jeong Hun Yeo
Seunghee Han
Minsu Kim
Y. Ro
312
32
0
23 Feb 2024
Efficient Training for Multilingual Visual Speech Recognition: Pre-training with Discretized Visual Speech Representation
Minsu Kim
Jeong Hun Yeo
Se Jin Park
J. Choi
Y. Ro
290
8
0
18 Jan 2024
AV2AV: Direct Audio-Visual Speech to Audio-Visual Speech Translation with Unified Audio-Visual Speech Representation
Computer Vision and Pattern Recognition (CVPR), 2023
J. Choi
Se Jin Park
Minsu Kim
Y. Ro
363
16
0
05 Dec 2023
Intuitive Multilingual Audio-Visual Speech Recognition with a Single-Trained Model
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Joanna Hong
Se Jin Park
Y. Ro
VLM
315
9
0
23 Oct 2023
Visual Speech Recognition for Languages with Limited Labeled Data using Automatic Labels from Whisper
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023
Jeong Hun Yeo
Minsu Kim
Shinji Watanabe
Y. Ro
VLM
263
16
0
15 Sep 2023
Let There Be Sound: Reconstructing High Quality Speech from Silent Videos
AAAI Conference on Artificial Intelligence (AAAI), 2023
Ji-Hoon Kim
Jaehun Kim
Joon Son Chung
284
10
0
29 Aug 2023
Lip Reading for Low-resource Languages by Learning and Combining General Speech Knowledge and Language-specific Knowledge
IEEE International Conference on Computer Vision (ICCV), 2023
Minsu Kim
Jeong Hun Yeo
J. Choi
Y. Ro
208
27
0
18 Aug 2023
AKVSR: Audio Knowledge Empowered Visual Speech Recognition by Compressing Audio Knowledge of a Pretrained Model
IEEE transactions on multimedia (IEEE TMM), 2023
Jeong Hun Yeo
Minsu Kim
J. Choi
Dae Hoe Kim
Y. Ro
187
26
0
15 Aug 2023
Many-to-Many Spoken Language Translation via Unified Speech and Text Representation Learning with Unit-to-Unit Translation
IEEE/ACM Transactions on Audio Speech and Language Processing (TASLP), 2023
Minsu Kim
J. Choi
Dahun Kim
Y. Ro
195
10
0
03 Aug 2023
SelfTalk: A Self-Supervised Commutative Training Diagram to Comprehend 3D Talking Faces
ACM Multimedia (ACM MM), 2023
Ziqiao Peng
Yihao Luo
Yue Shi
Hao-Xuan Xu
Xiangyu Zhu
Jun He
Hongyan Liu
Zhaoxin Fan
261
68
0
19 Jun 2023
Hearing Lips in Noise: Universal Viseme-Phoneme Mapping and Transfer for Robust Audio-Visual Speech Recognition
Annual Meeting of the Association for Computational Linguistics (ACL), 2023
Yuchen Hu
Ruizhe Li
Cheng Chen
Chengwei Qin
Qiu-shi Zhu
Eng Siong Chng
223
13
0
18 Jun 2023
OpenSR: Open-Modality Speech Recognition via Maintaining Multi-Modality Alignment
Annual Meeting of the Association for Computational Linguistics (ACL), 2023
Xize Cheng
Tao Jin
Lin Li
Wang Lin
Xinyu Duan
Zhou Zhao
VLM
265
20
0
10 Jun 2023
Intelligible Lip-to-Speech Synthesis with Speech Units
Interspeech (Interspeech), 2023
J. Choi
Minsu Kim
Y. Ro
220
35
0
31 May 2023
Cross-Modal Global Interaction and Local Alignment for Audio-Visual Speech Recognition
International Joint Conference on Artificial Intelligence (IJCAI), 2023
Yuchen Hu
Ruizhe Li
Chen Chen
Heqing Zou
Qiu-shi Zhu
Eng Siong Chng
211
14
0
16 May 2023
Multi-Temporal Lip-Audio Memory for Visual Speech Recognition
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023
Jeong Hun Yeo
Minsu Kim
Y. Ro
170
16
0
08 May 2023
Word-level Persian Lipreading Dataset
International Conference on Computer and Knowledge Engineering (ICCKE), 2022
J. Peymanfard
Ali Lashini
Samin Heydarian
Hossein Zeinali
N. Mozayani
154
7
0
08 Apr 2023
Seeing What You Said: Talking Face Generation Guided by a Lip Reading Expert
Computer Vision and Pattern Recognition (CVPR), 2023
Jiadong Wang
Xinyuan Qian
Malu Zhang
R. Tan
Haizhou Li
EGVM
204
137
0
29 Mar 2023
Deep Visual Forced Alignment: Learning to Align Transcription with Talking Face Video
AAAI Conference on Artificial Intelligence (AAAI), 2023
Minsu Kim
Chae Won Kim
Y. Ro
CVBM
DiffM
143
4
0
27 Feb 2023
Lip-to-Speech Synthesis in the Wild with Multi-task Learning
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023
Minsu Kim
Joanna Hong
Y. Ro
213
28
0
17 Feb 2023
Prompt Tuning of Deep Neural Networks for Speaker-adaptive Visual Speech Recognition
IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2023
Minsu Kim
Hyungil Kim
Y. Ro
VLM
226
30
0
16 Feb 2023
A Multi-Purpose Audio-Visual Corpus for Multi-Modal Persian Speech Recognition: the Arman-AV Dataset
Expert systems with applications (ESWA), 2023
J. Peymanfard
Samin Heydarian
Ali Lashini
Hossein Zeinali
Mohammad Reza Mohammadi
N. Mozayani
278
14
0
21 Jan 2023
VATLM: Visual-Audio-Text Pre-Training with Unified Masked Prediction for Speech Representation Learning
IEEE transactions on multimedia (IEEE TMM), 2022
Qiu-shi Zhu
Long Zhou
Zi-Hua Zhang
Shujie Liu
Binxing Jiao
Jie Zhang
Lirong Dai
Daxin Jiang
Jinyu Li
Furu Wei
264
50
0
21 Nov 2022
Speaker-adaptive Lip Reading with User-dependent Padding
European Conference on Computer Vision (ECCV), 2022
Minsu Kim
Hyunjun Kim
Y. Ro
123
29
0
09 Aug 2022
Visual Context-driven Audio Feature Enhancement for Robust End-to-End Audio-Visual Speech Recognition
Interspeech (Interspeech), 2022
Joanna Hong
Minsu Kim
Daehun Yoo
Y. Ro
218
26
0
13 Jul 2022
1