v1v2 (latest)

SynthVSR: Scaling Up Visual Speech Recognition With Synthetic Supervision

Computer Vision and Pattern Recognition (CVPR), 2023

30 March 2023

Xubo Liu

Egor Lakomkin

Konstantinos Vougioukas

Papers citing "SynthVSR: Scaling Up Visual Speech Recognition With Synthetic Supervision"

13 / 13 papers shown

Phoneme-Level Visual Speech Recognition via Point-Visual Fusion and Language Model Reconstruction

Matthew Kit Khinn Teng

Haibo Zhang

Takeshi Saitoh

116

25 Jul 2025

VALLR: Visual ASR Language Model for Lip Reading

Marshall Thomas

Edward Fish

Richard Bowden

279

27 Mar 2025

LipGen: Viseme-Guided Lip Video Generation for Enhancing Visual Speech RecognitionIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2025

394

08 Jan 2025

Unified Speech Recognition: A Single Model for Auditory, Visual, and Audiovisual InputsNeural Information Processing Systems (NeurIPS), 2024

335

04 Nov 2024

Tailored Design of Audio-Visual Speech Recognition Models using Branchformers

David Gimeno-Gómez

Carlos David Martínez Hinarejos

413

09 Jul 2024

Contrastive Learning from Synthetic Audio DoppelgängersInternational Conference on Learning Representations (ICLR), 2024

Manuel Cherep

Nikhil Singh

344

09 Jun 2024

Exploring the Impact of Synthetic Data for Aerial-view Human Detection

Shuvra S. Bhattacharyya

318

24 May 2024

BRAVEn: Improving Self-Supervised Pre-training for Visual and Auditory Speech RecognitionIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2024

235

02 Apr 2024

LiteVSR: Efficient Visual Speech Recognition by Learning from Speech Representations of Unlabeled DataIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023

Hendrik Laux

Anke Schmeink

134

15 Dec 2023

Do VSR Models Generalize Beyond LRS3?IEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2023

Sanath Narayan

Merouane Debbah

187

23 Nov 2023

AKVSR: Audio Knowledge Empowered Visual Speech Recognition by Compressing Audio Knowledge of a Pretrained ModelIEEE transactions on multimedia (IEEE TMM), 2023

Jeong Hun Yeo

184

15 Aug 2023

Visually-Aware Audio Captioning With Adaptive Audio-Visual AttentionInterspeech (Interspeech), 2022

...

394

28 Oct 2022

StyleGAN2 Distillation for Feed-forward Image ManipulationEuropean Conference on Computer Vision (ECCV), 2020

Yuri Viazovetskyi

V. Ivashkin

Evgenii Kashin

472

148

07 Mar 2020