Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2212.06246
Cited By
Jointly Learning Visual and Auditory Speech Representations from Raw Data
12 December 2022
A. Haliassos
Pingchuan Ma
Rodrigo Mira
Stavros Petridis
M. Pantic
SSL
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Jointly Learning Visual and Auditory Speech Representations from Raw Data"
17 / 17 papers shown
Title
mWhisper-Flamingo for Multilingual Audio-Visual Noise-Robust Speech Recognition
Andrew Rouditchenko
Saurabhchand Bhati
Samuel Thomas
Hilde Kuehne
Rogerio Feris
93
1
0
03 Feb 2025
Multi-Task Corrupted Prediction for Learning Robust Audio-Visual Speech Representation
Sungnyun Kim
Sungwoo Cho
Sangmin Bae
Kangwook Jang
Se-Young Yun
SSL
68
1
0
23 Jan 2025
MaLa-ASR: Multimedia-Assisted LLM-Based ASR
Guanrou Yang
Ziyang Ma
Fan Yu
Zhifu Gao
Shiliang Zhang
Xie Chen
AuLLM
36
2
0
09 Jun 2024
Multilingual Audio-Visual Speech Recognition with Hybrid CTC/RNN-T Fast Conformer
Maxime Burchi
Krishna C. Puvvada
Jagadeesh Balam
Boris Ginsburg
Radu Timofte
25
7
0
14 Mar 2024
Do VSR Models Generalize Beyond LRS3?
Y. A. D. Djilali
Sanath Narayan
Eustache Le Bihan
Haithem Boussaid
Ebtesam Almazrouei
Merouane Debbah
22
4
0
23 Nov 2023
Self-Supervised Speech Representation Learning: A Review
Abdel-rahman Mohamed
Hung-yi Lee
Lasse Borgholt
Jakob Drachmann Havtorn
Joakim Edin
...
Shang-Wen Li
Karen Livescu
Lars Maaløe
Tara N. Sainath
Shinji Watanabe
SSL
AI4TS
124
339
0
21 May 2022
Visual Speech Recognition for Multiple Languages in the Wild
Pingchuan Ma
Stavros Petridis
M. Pantic
VLM
112
95
0
26 Feb 2022
Transformer-Based Video Front-Ends for Audio-Visual Speech Recognition for Single and Multi-Person Video
Dmitriy Serdyuk
Otavio Braga
Olivier Siohan
ViT
83
40
0
25 Jan 2022
Masked Autoencoders Are Scalable Vision Learners
Kaiming He
Xinlei Chen
Saining Xie
Yanghao Li
Piotr Dollár
Ross B. Girshick
ViT
TPM
258
7,337
0
11 Nov 2021
Emerging Properties in Self-Supervised Vision Transformers
Mathilde Caron
Hugo Touvron
Ishan Misra
Hervé Jégou
Julien Mairal
Piotr Bojanowski
Armand Joulin
283
5,723
0
29 Apr 2021
End-to-end Audio-visual Speech Recognition with Conformers
Pingchuan Ma
Stavros Petridis
M. Pantic
79
221
0
12 Feb 2021
BYOL works even without batch statistics
Pierre Harvey Richemond
Jean-Bastien Grill
Florent Altché
Corentin Tallec
Florian Strub
...
Samuel L. Smith
Soham De
Razvan Pascanu
Bilal Piot
Michal Valko
SSL
242
114
0
20 Oct 2020
Does Visual Self-Supervision Improve Learning of Speech Representations for Emotion Recognition?
Abhinav Shukla
Stavros Petridis
M. Pantic
SSL
27
28
0
04 May 2020
Multi-task self-supervised learning for Robust Speech Recognition
Mirco Ravanelli
Jianyuan Zhong
Santiago Pascual
P. Swietojanski
João Monteiro
J. Trmal
Yoshua Bengio
SSL
171
288
0
25 Jan 2020
Lipreading using Temporal Convolutional Networks
Brais Martínez
Pingchuan Ma
Stavros Petridis
M. Pantic
165
237
0
23 Jan 2020
VoxCeleb2: Deep Speaker Recognition
Joon Son Chung
Arsha Nagrani
Andrew Zisserman
214
2,224
0
14 Jun 2018
Lip Reading Sentences in the Wild
Joon Son Chung
A. Senior
Oriol Vinyals
Andrew Zisserman
162
782
0
16 Nov 2016
1