Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1911.04890
Cited By
Recurrent Neural Network Transducer for Audio-Visual Speech Recognition
8 November 2019
Takaki Makino
H. Liao
Yannis Assael
Brendan Shillingford
Basi García
Otavio Braga
Olivier Siohan
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Recurrent Neural Network Transducer for Audio-Visual Speech Recognition"
31 / 31 papers shown
Title
mWhisper-Flamingo for Multilingual Audio-Visual Noise-Robust Speech Recognition
Andrew Rouditchenko
Saurabhchand Bhati
Samuel Thomas
Hilde Kuehne
Rogerio Feris
118
1
0
03 Feb 2025
Multi-Task Corrupted Prediction for Learning Robust Audio-Visual Speech Representation
Sungnyun Kim
Sungwoo Cho
Sangmin Bae
Kangwook Jang
Se-Young Yun
SSL
79
1
0
23 Jan 2025
MaLa-ASR: Multimedia-Assisted LLM-Based ASR
Guanrou Yang
Ziyang Ma
Fan Yu
Zhifu Gao
Shiliang Zhang
Xie Chen
AuLLM
44
3
0
09 Jun 2024
Do VSR Models Generalize Beyond LRS3?
Y. A. D. Djilali
Sanath Narayan
Eustache Le Bihan
Haithem Boussaid
Ebtesam Almazrouei
Merouane Debbah
35
4
0
23 Nov 2023
Hearing Lips in Noise: Universal Viseme-Phoneme Mapping and Transfer for Robust Audio-Visual Speech Recognition
Yuchen Hu
Ruizhe Li
Cheng Chen
Chengwei Qin
Qiu-shi Zhu
Eng Siong Chng
39
5
0
18 Jun 2023
SynthVSR: Scaling Up Visual Speech Recognition With Synthetic Supervision
Xubo Liu
Egor Lakomkin
Konstantinos Vougioukas
Pingchuan Ma
Honglie Chen
...
Niko Moritz
J. Kolár
Stavros Petridis
M. Pantic
Christian Fuegen
52
19
0
30 Mar 2023
AVFormer: Injecting Vision into Frozen Speech Models for Zero-Shot AV-ASR
Paul Hongsuck Seo
Arsha Nagrani
Cordelia Schmid
29
15
0
29 Mar 2023
The DKU Post-Challenge Audio-Visual Wake Word Spotting System for the 2021 MISP Challenge: Deep Analysis
Haoxu Wang
Ming Cheng
Qiang Fu
Ming Li
39
8
0
04 Mar 2023
Jointly Learning Visual and Auditory Speech Representations from Raw Data
A. Haliassos
Pingchuan Ma
Rodrigo Mira
Stavros Petridis
M. Pantic
SSL
45
49
0
12 Dec 2022
Leveraging Modality-specific Representations for Audio-visual Speech Recognition via Reinforcement Learning
Chen Chen
Yuchen Hu
Qiang Zhang
Heqing Zou
Beier Zhu
Eng Siong Chng
33
26
0
10 Dec 2022
AVATAR submission to the Ego4D AV Transcription Challenge
Paul Hongsuck Seo
Arsha Nagrani
Cordelia Schmid
30
0
0
18 Nov 2022
Streaming Audio-Visual Speech Recognition with Alignment Regularization
Pingchuan Ma
Niko Moritz
Stavros Petridis
Christian Fuegen
M. Pantic
37
2
0
03 Nov 2022
Relaxed Attention for Transformer Models
Timo Lohrenz
Björn Möller
Zhengyang Li
Tim Fingscheidt
KELM
29
11
0
20 Sep 2022
u-HuBERT: Unified Mixed-Modal Speech Pretraining And Zero-Shot Transfer to Unlabeled Modality
Wei-Ning Hsu
Bowen Shi
SSL
VLM
29
42
0
14 Jul 2022
Visual Context-driven Audio Feature Enhancement for Robust End-to-End Audio-Visual Speech Recognition
Joanna Hong
Minsu Kim
Daehun Yoo
Y. Ro
26
21
0
13 Jul 2022
Is Lip Region-of-Interest Sufficient for Lipreading?
Jing-Xuan Zhang
Genshun Wan
Jia-Yu Pan
24
6
0
28 May 2022
A Closer Look at Audio-Visual Multi-Person Speech Recognition and Active Speaker Selection
Otavio Braga
Olivier Siohan
27
7
0
11 May 2022
Best of Both Worlds: Multi-task Audio-Visual Automatic Speech Recognition and Active Speaker Detection
Otavio Braga
Olivier Siohan
CVBM
35
8
0
10 May 2022
Audio-visual multi-channel speech separation, dereverberation and recognition
Guinan Li
Jianwei Yu
Jiajun Deng
Xunying Liu
Helen Meng
21
7
0
05 Apr 2022
End-to-end multi-talker audio-visual ASR using an active speaker attention module
R. Rose
Olivier Siohan
18
3
0
01 Apr 2022
Visual Speech Recognition for Multiple Languages in the Wild
Pingchuan Ma
Stavros Petridis
M. Pantic
VLM
130
145
0
26 Feb 2022
Transformer-Based Video Front-Ends for Audio-Visual Speech Recognition for Single and Multi-Person Video
Dmitriy Serdyuk
Otavio Braga
Olivier Siohan
ViT
96
40
0
25 Jan 2022
Robust Self-Supervised Audio-Visual Speech Recognition
Bowen Shi
Wei-Ning Hsu
Abdel-rahman Mohamed
39
90
0
05 Jan 2022
Learning Audio-Visual Speech Representation by Masked Multimodal Cluster Prediction
Bowen Shi
Wei-Ning Hsu
Kushal Lakhotia
Abdel-rahman Mohamed
SSL
55
306
0
05 Jan 2022
Deep Spoken Keyword Spotting: An Overview
Iván López-Espejo
Zheng-Hua Tan
John H. L. Hansen
Jesper Jensen
21
102
0
20 Nov 2021
Visual Keyword Spotting with Attention
Prajwal K R
Liliane Momeni
Triantafyllos Afouras
Andrew Zisserman
16
13
0
29 Oct 2021
Sub-word Level Lip Reading With Visual Attention
Prajwal K R
Triantafyllos Afouras
Andrew Zisserman
17
92
0
14 Oct 2021
LiRA: Learning Visual Speech Representations from Audio through Self-supervision
Pingchuan Ma
Rodrigo Mira
Stavros Petridis
Björn W. Schuller
M. Pantic
SSL
24
53
0
16 Jun 2021
Audio-visual Multi-channel Recognition of Overlapped Speech
Jianwei Yu
Bo Wu
R. Yu
Shi-Xiong Zhang
Lianwu Chen
Yong Xu. Meng Yu
Dan Su
Dong Yu
Xunying Liu
Helen Meng
24
19
0
18 May 2020
Deep Audio-Visual Learning: A Survey
Hao Zhu
Mandi Luo
Rui Wang
A. Zheng
Ran He
31
156
0
14 Jan 2020
Lip Reading Sentences in the Wild
Joon Son Chung
A. Senior
Oriol Vinyals
Andrew Zisserman
185
784
0
16 Nov 2016
1