Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2308.09311
Cited By
Lip Reading for Low-resource Languages by Learning and Combining General Speech Knowledge and Language-specific Knowledge
18 August 2023
Minsu Kim
Jeong Hun Yeo
J. Choi
Y. Ro
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Lip Reading for Low-resource Languages by Learning and Combining General Speech Knowledge and Language-specific Knowledge"
18 / 18 papers shown
Title
MMS-LLaMA: Efficient LLM-based Audio-Visual Speech Recognition with Minimal Multimodal Speech Tokens
Jeong Hun Yeo
Hyeongseop Rha
Se Jin Park
Y. Ro
51
0
0
14 Mar 2025
Large Multimodal Models for Low-Resource Languages: A Survey
Marian Lupascu
Ana-Cristina Rogoz
Mihai-Sorin Stupariu
Radu Tudor Ionescu
51
1
0
08 Feb 2025
mWhisper-Flamingo for Multilingual Audio-Visual Noise-Robust Speech Recognition
Andrew Rouditchenko
Saurabhchand Bhati
Samuel Thomas
Hilde Kuehne
Rogerio Feris
93
1
0
03 Feb 2025
Personalized Lip Reading: Adapting to Your Unique Lip Movements with Vision and Language
Jeong Hun Yeo
Chae Won Kim
Hyunjun Kim
Hyeongseop Rha
Seunghee Han
Wen-Huang Cheng
Y. Ro
52
3
0
03 Jan 2025
Tailored Design of Audio-Visual Speech Recognition Models using Branchformers
David Gimeno-Gómez
Carlos David Martínez Hinarejos
86
2
0
09 Jul 2024
SyncVSR: Data-Efficient Visual Speech Recognition with End-to-End Crossmodal Audio Token Synchronization
Young Jin Ahn
Jungwoo Park
Sangha Park
Jonghyun Choi
Kee-Eung Kim
29
7
0
18 Jun 2024
Efficient Training for Multilingual Visual Speech Recognition: Pre-training with Discretized Visual Speech Representation
Minsu Kim
Jeong Hun Yeo
Se Jin Park
J. Choi
Y. Ro
14
5
0
18 Jan 2024
AV2AV: Direct Audio-Visual Speech to Audio-Visual Speech Translation with Unified Audio-Visual Speech Representation
J. Choi
Se Jin Park
Minsu Kim
Y. Ro
14
12
0
05 Dec 2023
Visual Speech Recognition for Languages with Limited Labeled Data using Automatic Labels from Whisper
Jeong Hun Yeo
Minsu Kim
Shinji Watanabe
Y. Ro
VLM
21
12
0
15 Sep 2023
Towards Practical and Efficient Image-to-Speech Captioning with Vision-Language Pre-training and Multi-modal Tokens
Minsu Kim
J. Choi
Soumi Maiti
Jeong Hun Yeo
Shinji Watanabe
Y. Ro
VLM
16
6
0
15 Sep 2023
Watch or Listen: Robust Audio-Visual Speech Recognition with Visual Corruption Modeling and Reliability Scoring
Joanna Hong
Minsu Kim
J. Choi
Y. Ro
27
19
0
15 Mar 2023
Visual Speech Recognition for Multiple Languages in the Wild
Pingchuan Ma
Stavros Petridis
M. Pantic
VLM
112
144
0
26 Feb 2022
End-to-end Audio-visual Speech Recognition with Conformers
Pingchuan Ma
Stavros Petridis
M. Pantic
79
221
0
12 Feb 2021
Generative Spoken Language Modeling from Raw Audio
Kushal Lakhotia
Evgeny Kharitonov
Wei-Ning Hsu
Yossi Adi
Adam Polyak
...
Tu Nguyen
Jade Copet
Alexei Baevski
A. Mohamed
Emmanuel Dupoux
AuLLM
174
336
0
01 Feb 2021
Lipreading using Temporal Convolutional Networks
Brais Martínez
Pingchuan Ma
Stavros Petridis
M. Pantic
168
237
0
23 Jan 2020
Unified Vision-Language Pre-Training for Image Captioning and VQA
Luowei Zhou
Hamid Palangi
Lei Zhang
Houdong Hu
Jason J. Corso
Jianfeng Gao
MLLM
VLM
250
922
0
24 Sep 2019
VoxCeleb2: Deep Speaker Recognition
Joon Son Chung
Arsha Nagrani
Andrew Zisserman
214
2,224
0
14 Jun 2018
Lip Reading Sentences in the Wild
Joon Son Chung
A. Senior
Oriol Vinyals
Andrew Zisserman
162
782
0
16 Nov 2016
1