Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2409.12319
Cited By
Large Language Models are Strong Audio-Visual Speech Recognition Learners
18 September 2024
Umberto Cappellazzo
Minsu Kim
Honglie Chen
Pingchuan Ma
Stavros Petridis
Daniele Falavigna
Alessio Brutti
Maja Pantic
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Large Language Models are Strong Audio-Visual Speech Recognition Learners"
2 / 2 papers shown
Title
mWhisper-Flamingo for Multilingual Audio-Visual Noise-Robust Speech Recognition
Andrew Rouditchenko
Saurabhchand Bhati
Samuel Thomas
Hilde Kuehne
Rogerio Feris
90
1
0
03 Feb 2025
Multi-modal Speech Transformer Decoders: When Do Multiple Modalities Improve Accuracy?
Yiwen Guan
V. Trinh
Vivek Voleti
Jacob Whitehill
32
1
0
13 Sep 2024
1