Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2201.10439
Cited By
v1
v2
v3 (latest)
Transformer-Based Video Front-Ends for Audio-Visual Speech Recognition for Single and Multi-Person Video
Interspeech (Interspeech), 2022
25 January 2022
Dmitriy Serdyuk
Otavio Braga
Olivier Siohan
ViT
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Transformer-Based Video Front-Ends for Audio-Visual Speech Recognition for Single and Multi-Person Video"
26 / 26 papers shown
Phoneme-Level Visual Speech Recognition via Point-Visual Fusion and Language Model Reconstruction
Matthew Kit Khinn Teng
Haibo Zhang
Takeshi Saitoh
200
1
0
25 Jul 2025
CNVSRC 2024: The Second Chinese Continuous Visual Speech Recognition Challenge
Zehua Liu
Xiaolou Li
Chen Chen
Lantian Li
D. Wang
336
1
0
27 May 2025
VALLR: Visual ASR Language Model for Lip Reading
Marshall Thomas
Edward Fish
Richard Bowden
389
7
0
27 Mar 2025
Audio-Visual Representation Learning via Knowledge Distillation from Speech Foundation Models
Pattern Recognition (Pattern Recogn.), 2025
Jing-Xuan Zhang
Genshun Wan
Jianqing Gao
Zhen-Hua Ling
355
13
0
09 Feb 2025
mWhisper-Flamingo for Multilingual Audio-Visual Noise-Robust Speech Recognition
IEEE Signal Processing Letters (IEEE SPL), 2025
Andrew Rouditchenko
Saurabhchand Bhati
Samuel Thomas
Hilde Kuehne
Rogerio Feris
598
4
0
03 Feb 2025
Efficient Audiovisual Speech Processing via MUTUD: Multimodal Training and Unimodal Deployment
Joanna Hong
Sanjeel Parekh
Honglie Chen
Jacob Donley
Ke Tan
Buye Xu
Anurag Kumar
286
0
0
30 Jan 2025
Unified Speech Recognition: A Single Model for Auditory, Visual, and Audiovisual Inputs
Neural Information Processing Systems (NeurIPS), 2024
A. Haliassos
Rodrigo Mira
Honglie Chen
Zoe Landgraf
Stavros Petridis
Maja Pantic
SSL
420
16
0
04 Nov 2024
Large Language Models are Strong Audio-Visual Speech Recognition Learners
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2024
Umberto Cappellazzo
Minsu Kim
Honglie Chen
Pingchuan Ma
Stavros Petridis
Daniele Falavigna
Alessio Brutti
Maja Pantic
467
40
0
18 Sep 2024
SyncVSR: Data-Efficient Visual Speech Recognition with End-to-End Crossmodal Audio Token Synchronization
Young Jin Ahn
Jungwoo Park
Sangha Park
Jonghyun Choi
Kee-Eung Kim
256
15
0
18 Jun 2024
Enhancing Lip Reading with Multi-Scale Video and Multi-Encoder
He Wang
Pengcheng Guo
Xucheng Wan
Huan Zhou
Lei Xie
291
5
0
08 Apr 2024
BRAVEn: Improving Self-Supervised Pre-training for Visual and Auditory Speech Recognition
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2024
A. Haliassos
Andreas Zinonos
Rodrigo Mira
Stavros Petridis
Maja Pantic
VLM
SSL
AI4TS
344
26
0
02 Apr 2024
Multilingual Audio-Visual Speech Recognition with Hybrid CTC/RNN-T Fast Conformer
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2024
Maxime Burchi
Krishna C. Puvvada
Jagadeesh Balam
Boris Ginsburg
Radu Timofte
266
19
0
14 Mar 2024
TorchAudio 2.1: Advancing speech recognition, self-supervised learning, and audio processing components for PyTorch
Automatic Speech Recognition & Understanding (ASRU), 2023
Jeff Hwang
Moto Hira
Caroline Chen
Xiaohui Zhang
Zhaoheng Ni
...
Yumeng Tao
Robin Scheibler
Samuele Cornell
Sean Kim
Stavros Petridis
308
37
0
27 Oct 2023
AV-CPL: Continuous Pseudo-Labeling for Audio-Visual Speech Recognition
Andrew Rouditchenko
R. Collobert
Tatiana Likhomanenko
VLM
272
6
0
29 Sep 2023
AKVSR: Audio Knowledge Empowered Visual Speech Recognition by Compressing Audio Knowledge of a Pretrained Model
IEEE transactions on multimedia (IEEE TMM), 2023
Jeong Hun Yeo
Minsu Kim
J. Choi
Dae Hoe Kim
Y. Ro
260
27
0
15 Aug 2023
Adaptation of Tongue Ultrasound-Based Silent Speech Interfaces Using Spatial Transformer Networks
Interspeech (Interspeech), 2023
L. Tóth
Amin Honarmandi Shandiz
G. Gosztolya
T. Csapó
351
9
0
30 May 2023
SynthVSR: Scaling Up Visual Speech Recognition With Synthetic Supervision
Computer Vision and Pattern Recognition (CVPR), 2023
Xubo Liu
Egor Lakomkin
Konstantinos Vougioukas
Pingchuan Ma
Honglie Chen
...
Niko Moritz
J. Kolár
Stavros Petridis
Maja Pantic
Christian Fuegen
513
27
0
30 Mar 2023
Auto-AVSR: Audio-Visual Speech Recognition with Automatic Labels
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023
Pingchuan Ma
A. Haliassos
Adriana Fernandez-Lopez
Honglie Chen
Stavros Petridis
Maja Pantic
412
191
0
25 Mar 2023
Conformers are All You Need for Visual Speech Recognition
Oscar Chang
H. Liao
Dmitriy Serdyuk
Ankit Parag Shah
Olivier Siohan
VLM
330
16
0
17 Feb 2023
AV-data2vec: Self-supervised Learning of Audio-Visual Speech Representations with Contextualized Target Representations
Automatic Speech Recognition & Understanding (ASRU), 2023
Jiachen Lian
Alexei Baevski
Wei-Ning Hsu
Michael Auli
SSL
429
46
0
10 Feb 2023
Jointly Learning Visual and Auditory Speech Representations from Raw Data
International Conference on Learning Representations (ICLR), 2022
A. Haliassos
Pingchuan Ma
Rodrigo Mira
Stavros Petridis
Maja Pantic
SSL
338
73
0
12 Dec 2022
Streaming Audio-Visual Speech Recognition with Alignment Regularization
Interspeech (Interspeech), 2022
Pingchuan Ma
Niko Moritz
Stavros Petridis
Christian Fuegen
Maja Pantic
258
2
0
03 Nov 2022
Predict-and-Update Network: Audio-Visual Speech Recognition Inspired by Human Speech Perception
IEEE Transactions on Audio, Speech, and Language Processing (TASLP), 2022
Jiadong Wang
Xinyuan Qian
Haizhou Li
209
18
0
05 Sep 2022
Visual Context-driven Audio Feature Enhancement for Robust End-to-End Audio-Visual Speech Recognition
Interspeech (Interspeech), 2022
Joanna Hong
Minsu Kim
Daehun Yoo
Y. Ro
304
29
0
13 Jul 2022
FastLTS: Non-Autoregressive End-to-End Unconstrained Lip-to-Speech Synthesis
ACM Multimedia (ACM MM), 2022
Yongqiang Wang
Zhou Zhao
345
12
0
08 Jul 2022
Visual Speech Recognition for Multiple Languages in the Wild
Nature Machine Intelligence (Nat. Mach. Intell.), 2022
Pingchuan Ma
Stavros Petridis
Maja Pantic
VLM
457
202
0
26 Feb 2022
1
Page 1 of 1