ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2204.01725
  4. Cited By
Distinguishing Homophenes Using Multi-Head Visual-Audio Memory for Lip
  Reading

Distinguishing Homophenes Using Multi-Head Visual-Audio Memory for Lip Reading

AAAI Conference on Artificial Intelligence (AAAI), 2022
4 April 2022
Minsu Kim
Jeong Hun Yeo
Yong Man Ro
ArXiv (abs)PDFHTML

Papers citing "Distinguishing Homophenes Using Multi-Head Visual-Audio Memory for Lip Reading"

40 / 40 papers shown
Two Heads Are Better Than One: Audio-Visual Speech Error Correction with Dual Hypotheses
Two Heads Are Better Than One: Audio-Visual Speech Error Correction with Dual Hypotheses
S. Kim
Kangwook Jang
Sungwoo Cho
Joon Son Chung
Hoirin Kim
Se-Young Yun
102
0
0
15 Oct 2025
Towards Inclusive Communication: A Unified Framework for Generating Spoken Language from Sign, Lip, and Audio
Towards Inclusive Communication: A Unified Framework for Generating Spoken Language from Sign, Lip, and Audio
Jeong Hun Yeo
Hyeongseop Rha
Sungjune Park
Junil Won
Y. Ro
180
0
0
28 Aug 2025
AD-AVSR: Asymmetric Dual-stream Enhancement for Robust Audio-Visual Speech Recognition
AD-AVSR: Asymmetric Dual-stream Enhancement for Robust Audio-Visual Speech Recognition
Junxiao Xue
Xiaozhen Liu
Xuecheng Wu
Xinyi Yin
Danlei Huang
Fei Yu
121
0
0
11 Aug 2025
InfoSyncNet: Information Synchronization Temporal Convolutional Network for Visual Speech Recognition
InfoSyncNet: Information Synchronization Temporal Convolutional Network for Visual Speech Recognition
Junxiao Xue
Xiaozhen Liu
Xuecheng Wu
Fei Yu
Jun Wang
139
0
0
04 Aug 2025
MemoryTalker: Personalized Speech-Driven 3D Facial Animation via Audio-Guided Stylization
MemoryTalker: Personalized Speech-Driven 3D Facial Animation via Audio-Guided Stylization
Hyung Kyu Kim
Sangmin Lee
Hak Gu Kim
145
1
0
28 Jul 2025
Phoneme-Level Visual Speech Recognition via Point-Visual Fusion and Language Model Reconstruction
Phoneme-Level Visual Speech Recognition via Point-Visual Fusion and Language Model Reconstruction
Matthew Kit Khinn Teng
Haibo Zhang
Takeshi Saitoh
127
1
0
25 Jul 2025
MeMo: Attentional Momentum for Real-time Audio-visual Speaker Extraction under Impaired Visual Conditions
MeMo: Attentional Momentum for Real-time Audio-visual Speaker Extraction under Impaired Visual Conditions
Junjie Li
Wenxuan Wu
Shuai Wang
Zexu Pan
Kong Aik Lee
Chao Yang
Haizhou Li
117
1
0
21 Jul 2025
TD3Net: A temporal densely connected multi-dilated convolutional network for lipreading
TD3Net: A temporal densely connected multi-dilated convolutional network for lipreadingJournal of Visual Communication and Image Representation (JVCIR), 2025
B. Lee
Wooseok Shin
Sung Won Han
237
0
0
19 Jun 2025
CNVSRC 2024: The Second Chinese Continuous Visual Speech Recognition Challenge
CNVSRC 2024: The Second Chinese Continuous Visual Speech Recognition Challenge
Zehua Liu
Xiaolou Li
Chen Chen
Lantian Li
D. Wang
232
1
0
27 May 2025
SwinLip: An Efficient Visual Speech Encoder for Lip Reading Using Swin Transformer
SwinLip: An Efficient Visual Speech Encoder for Lip Reading Using Swin Transformer
Young-Hu Park
R.-H. Park
Hyung-Min Park
343
6
0
07 May 2025
MMS-LLaMA: Efficient LLM-based Audio-Visual Speech Recognition with Minimal Multimodal Speech TokensAnnual Meeting of the Association for Computational Linguistics (ACL), 2025
Jeong Hun Yeo
Hyeongseop Rha
Se Jin Park
Y. Ro
408
11
0
14 Mar 2025
Lend a Hand: Semi Training-Free Cued Speech Recognition via MLLM-Driven Hand Modeling for Barrier-free Communication
Lend a Hand: Semi Training-Free Cued Speech Recognition via MLLM-Driven Hand Modeling for Barrier-free Communication
Guanjie Huang
Danny Hin Kwok Tsang
Li Liu
178
1
0
11 Mar 2025
Personalized Lip Reading: Adapting to Your Unique Lip Movements with Vision and Language
Personalized Lip Reading: Adapting to Your Unique Lip Movements with Vision and LanguageAAAI Conference on Artificial Intelligence (AAAI), 2024
Jeong Hun Yeo
Chae Won Kim
Hyunjun Kim
Hyeongseop Rha
Seunghee Han
Wen-Huang Cheng
Y. Ro
438
5
0
03 Jan 2025
RAL:Redundancy-Aware Lipreading Model Based on Differential Learning
  with Symmetric Views
RAL:Redundancy-Aware Lipreading Model Based on Differential Learning with Symmetric Views
Zejun gu
Junxia jiang
264
0
0
09 Sep 2024
SyncVSR: Data-Efficient Visual Speech Recognition with End-to-End
  Crossmodal Audio Token Synchronization
SyncVSR: Data-Efficient Visual Speech Recognition with End-to-End Crossmodal Audio Token Synchronization
Young Jin Ahn
Jungwoo Park
Sangha Park
Jonghyun Choi
Kee-Eung Kim
218
14
0
18 Jun 2024
JEP-KD: Joint-Embedding Predictive Architecture Based Knowledge
  Distillation for Visual Speech Recognition
JEP-KD: Joint-Embedding Predictive Architecture Based Knowledge Distillation for Visual Speech Recognition
Chang Sun
Hong Yang
Bo Qin
VLM
152
4
0
04 Mar 2024
Where Visual Speech Meets Language: VSP-LLM Framework for Efficient and
  Context-Aware Visual Speech Processing
Where Visual Speech Meets Language: VSP-LLM Framework for Efficient and Context-Aware Visual Speech Processing
Jeong Hun Yeo
Seunghee Han
Minsu Kim
Y. Ro
312
32
0
23 Feb 2024
Efficient Training for Multilingual Visual Speech Recognition:
  Pre-training with Discretized Visual Speech Representation
Efficient Training for Multilingual Visual Speech Recognition: Pre-training with Discretized Visual Speech Representation
Minsu Kim
Jeong Hun Yeo
Se Jin Park
J. Choi
Y. Ro
290
8
0
18 Jan 2024
AV2AV: Direct Audio-Visual Speech to Audio-Visual Speech Translation
  with Unified Audio-Visual Speech Representation
AV2AV: Direct Audio-Visual Speech to Audio-Visual Speech Translation with Unified Audio-Visual Speech RepresentationComputer Vision and Pattern Recognition (CVPR), 2023
J. Choi
Se Jin Park
Minsu Kim
Y. Ro
363
16
0
05 Dec 2023
Intuitive Multilingual Audio-Visual Speech Recognition with a
  Single-Trained Model
Intuitive Multilingual Audio-Visual Speech Recognition with a Single-Trained ModelConference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Joanna Hong
Se Jin Park
Y. Ro
VLM
315
9
0
23 Oct 2023
Visual Speech Recognition for Languages with Limited Labeled Data using
  Automatic Labels from Whisper
Visual Speech Recognition for Languages with Limited Labeled Data using Automatic Labels from WhisperIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023
Jeong Hun Yeo
Minsu Kim
Shinji Watanabe
Y. Ro
VLM
263
16
0
15 Sep 2023
Let There Be Sound: Reconstructing High Quality Speech from Silent
  Videos
Let There Be Sound: Reconstructing High Quality Speech from Silent VideosAAAI Conference on Artificial Intelligence (AAAI), 2023
Ji-Hoon Kim
Jaehun Kim
Joon Son Chung
284
10
0
29 Aug 2023
Lip Reading for Low-resource Languages by Learning and Combining General
  Speech Knowledge and Language-specific Knowledge
Lip Reading for Low-resource Languages by Learning and Combining General Speech Knowledge and Language-specific KnowledgeIEEE International Conference on Computer Vision (ICCV), 2023
Minsu Kim
Jeong Hun Yeo
J. Choi
Y. Ro
208
27
0
18 Aug 2023
AKVSR: Audio Knowledge Empowered Visual Speech Recognition by
  Compressing Audio Knowledge of a Pretrained Model
AKVSR: Audio Knowledge Empowered Visual Speech Recognition by Compressing Audio Knowledge of a Pretrained ModelIEEE transactions on multimedia (IEEE TMM), 2023
Jeong Hun Yeo
Minsu Kim
J. Choi
Dae Hoe Kim
Y. Ro
187
26
0
15 Aug 2023
Many-to-Many Spoken Language Translation via Unified Speech and Text
  Representation Learning with Unit-to-Unit Translation
Many-to-Many Spoken Language Translation via Unified Speech and Text Representation Learning with Unit-to-Unit TranslationIEEE/ACM Transactions on Audio Speech and Language Processing (TASLP), 2023
Minsu Kim
J. Choi
Dahun Kim
Y. Ro
195
10
0
03 Aug 2023
SelfTalk: A Self-Supervised Commutative Training Diagram to Comprehend
  3D Talking Faces
SelfTalk: A Self-Supervised Commutative Training Diagram to Comprehend 3D Talking FacesACM Multimedia (ACM MM), 2023
Ziqiao Peng
Yihao Luo
Yue Shi
Hao-Xuan Xu
Xiangyu Zhu
Jun He
Hongyan Liu
Zhaoxin Fan
261
68
0
19 Jun 2023
Hearing Lips in Noise: Universal Viseme-Phoneme Mapping and Transfer for
  Robust Audio-Visual Speech Recognition
Hearing Lips in Noise: Universal Viseme-Phoneme Mapping and Transfer for Robust Audio-Visual Speech RecognitionAnnual Meeting of the Association for Computational Linguistics (ACL), 2023
Yuchen Hu
Ruizhe Li
Cheng Chen
Chengwei Qin
Qiu-shi Zhu
Eng Siong Chng
223
13
0
18 Jun 2023
OpenSR: Open-Modality Speech Recognition via Maintaining Multi-Modality
  Alignment
OpenSR: Open-Modality Speech Recognition via Maintaining Multi-Modality AlignmentAnnual Meeting of the Association for Computational Linguistics (ACL), 2023
Xize Cheng
Tao Jin
Lin Li
Wang Lin
Xinyu Duan
Zhou Zhao
VLM
265
20
0
10 Jun 2023
Intelligible Lip-to-Speech Synthesis with Speech Units
Intelligible Lip-to-Speech Synthesis with Speech UnitsInterspeech (Interspeech), 2023
J. Choi
Minsu Kim
Y. Ro
220
35
0
31 May 2023
Cross-Modal Global Interaction and Local Alignment for Audio-Visual
  Speech Recognition
Cross-Modal Global Interaction and Local Alignment for Audio-Visual Speech RecognitionInternational Joint Conference on Artificial Intelligence (IJCAI), 2023
Yuchen Hu
Ruizhe Li
Chen Chen
Heqing Zou
Qiu-shi Zhu
Eng Siong Chng
211
14
0
16 May 2023
Multi-Temporal Lip-Audio Memory for Visual Speech Recognition
Multi-Temporal Lip-Audio Memory for Visual Speech RecognitionIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023
Jeong Hun Yeo
Minsu Kim
Y. Ro
170
16
0
08 May 2023
Word-level Persian Lipreading Dataset
Word-level Persian Lipreading DatasetInternational Conference on Computer and Knowledge Engineering (ICCKE), 2022
J. Peymanfard
Ali Lashini
Samin Heydarian
Hossein Zeinali
N. Mozayani
154
7
0
08 Apr 2023
Seeing What You Said: Talking Face Generation Guided by a Lip Reading
  Expert
Seeing What You Said: Talking Face Generation Guided by a Lip Reading ExpertComputer Vision and Pattern Recognition (CVPR), 2023
Jiadong Wang
Xinyuan Qian
Malu Zhang
R. Tan
Haizhou Li
EGVM
204
137
0
29 Mar 2023
Deep Visual Forced Alignment: Learning to Align Transcription with
  Talking Face Video
Deep Visual Forced Alignment: Learning to Align Transcription with Talking Face VideoAAAI Conference on Artificial Intelligence (AAAI), 2023
Minsu Kim
Chae Won Kim
Y. Ro
CVBMDiffM
143
4
0
27 Feb 2023
Lip-to-Speech Synthesis in the Wild with Multi-task Learning
Lip-to-Speech Synthesis in the Wild with Multi-task LearningIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023
Minsu Kim
Joanna Hong
Y. Ro
213
28
0
17 Feb 2023
Prompt Tuning of Deep Neural Networks for Speaker-adaptive Visual Speech
  Recognition
Prompt Tuning of Deep Neural Networks for Speaker-adaptive Visual Speech RecognitionIEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2023
Minsu Kim
Hyungil Kim
Y. Ro
VLM
226
30
0
16 Feb 2023
A Multi-Purpose Audio-Visual Corpus for Multi-Modal Persian Speech
  Recognition: the Arman-AV Dataset
A Multi-Purpose Audio-Visual Corpus for Multi-Modal Persian Speech Recognition: the Arman-AV DatasetExpert systems with applications (ESWA), 2023
J. Peymanfard
Samin Heydarian
Ali Lashini
Hossein Zeinali
Mohammad Reza Mohammadi
N. Mozayani
278
14
0
21 Jan 2023
VATLM: Visual-Audio-Text Pre-Training with Unified Masked Prediction for
  Speech Representation Learning
VATLM: Visual-Audio-Text Pre-Training with Unified Masked Prediction for Speech Representation LearningIEEE transactions on multimedia (IEEE TMM), 2022
Qiu-shi Zhu
Long Zhou
Zi-Hua Zhang
Shujie Liu
Binxing Jiao
Jie Zhang
Lirong Dai
Daxin Jiang
Jinyu Li
Furu Wei
264
50
0
21 Nov 2022
Speaker-adaptive Lip Reading with User-dependent Padding
Speaker-adaptive Lip Reading with User-dependent PaddingEuropean Conference on Computer Vision (ECCV), 2022
Minsu Kim
Hyunjun Kim
Y. Ro
123
29
0
09 Aug 2022
Visual Context-driven Audio Feature Enhancement for Robust End-to-End
  Audio-Visual Speech Recognition
Visual Context-driven Audio Feature Enhancement for Robust End-to-End Audio-Visual Speech RecognitionInterspeech (Interspeech), 2022
Joanna Hong
Minsu Kim
Daehun Yoo
Y. Ro
218
26
0
13 Jul 2022
1