Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1802.06424
Cited By
End-to-end Audiovisual Speech Recognition
18 February 2018
Stavros Petridis
Themos Stafylakis
Pingchuan Ma
Feipeng Cai
Georgios Tzimiropoulos
M. Pantic
Re-assign community
ArXiv
PDF
HTML
Papers citing
"End-to-end Audiovisual Speech Recognition"
50 / 101 papers shown
Title
Multi-modality Associative Bridging through Memory: Speech Sound Recollected from Face Video
Minsu Kim
Joanna Hong
Se Jin Park
Yong Man Ro
CVBM
25
40
0
04 Apr 2022
Learning Contextually Fused Audio-visual Representations for Audio-visual Speech Recognition
Zitian Zhang
Jie Zhang
Jian-Shu Zhang
Ming Wu
Xin Fang
Lirong Dai
SSL
41
10
0
15 Feb 2022
CI-AVSR: A Cantonese Audio-Visual Speech Dataset for In-car Command Recognition
Wenliang Dai
Samuel Cahyawijaya
Tiezheng Yu
Elham J. Barezi
Peng Xu
...
Genta Indra Winata
Qifeng Chen
Xiaojuan Ma
Bertram E. Shi
Pascale Fung
41
11
0
11 Jan 2022
Recent Advances in End-to-End Automatic Speech Recognition
Jinyu Li
VLM
35
363
0
02 Nov 2021
Neural Dubber: Dubbing for Videos According to Scripts
Chenxu Hu
Qiao Tian
Tingle Li
Yuping Wang
Yuxuan Wang
Hang Zhao
DiffM
VGen
36
39
0
15 Oct 2021
Advances and Challenges in Deep Lip Reading
Marzieh Oghbaie
Arian Sabaghi
Kooshan Hashemifard
Mohammad Akbari
VLM
30
15
0
15 Oct 2021
Sub-word Level Lip Reading With Visual Attention
Prajwal K R
Triantafyllos Afouras
Andrew Zisserman
17
92
0
14 Oct 2021
Audiovisual Singing Voice Separation
Bochen Li
Yuxuan Wang
Z. Duan
29
6
0
01 Jul 2021
End-to-End Video-To-Speech Synthesis using Generative Adversarial Networks
Rodrigo Mira
Konstantinos Vougioukas
Pingchuan Ma
Stavros Petridis
Björn W. Schuller
M. Pantic
29
43
0
27 Apr 2021
Improving state-of-the-art in Detecting Student Engagement with Resnet and TCN Hybrid Network
Ali Abedi
Shehroz S. Khan
19
47
0
20 Apr 2021
Exploring Deep Learning for Joint Audio-Visual Lip Biometrics
Meng Liu
Longbiao Wang
Kong Aik Lee
Hanyi Zhang
Chang Zeng
J. Dang
HAI
30
12
0
17 Apr 2021
End-to-end Audio-visual Speech Recognition with Conformers
Pingchuan Ma
Stavros Petridis
M. Pantic
84
225
0
12 Feb 2021
Lip-reading with Hierarchical Pyramidal Convolution and Self-Attention
Hang Chen
Jun Du
Yu Hu
Lirong Dai
Chin-Hui Lee
Baocai Yin
14
6
0
28 Dec 2020
Lip-reading with Densely Connected Temporal Convolutional Networks
Pingchuan Ma
Yujiang Wang
Jie Shen
Stavros Petridis
M. Pantic
16
55
0
29 Sep 2020
Correlating Subword Articulation with Lip Shapes for Embedding Aware Audio-Visual Speech Enhancement
Hang Chen
Jun Du
Yu Hu
Lirong Dai
Baocai Yin
Chin-Hui Lee
31
19
0
21 Sep 2020
WSRNet: Joint Spotting and Recognition of Handwritten Words
George Retsinas
Giorgos Sfikas
Petros Maragos
6
2
0
17 Aug 2020
Attentive Fusion Enhanced Audio-Visual Encoding for Transformer Based Robust Speech Recognition
L. Wei
Jie Zhang
Junfeng Hou
Lirong Dai
16
14
0
06 Aug 2020
Towards Practical Lipreading with Distilled and Efficient Models
Pingchuan Ma
Brais Martínez
Stavros Petridis
M. Pantic
26
95
0
13 Jul 2020
Jointly Modeling Motion and Appearance Cues for Robust RGB-T Tracking
Pengyu Zhang
Jie Zhao
Dong Wang
Huchuan Lu
Xiaoyun Yang
40
138
0
04 Jul 2020
"Notic My Speech" -- Blending Speech Patterns With Multimedia
Dhruva Sahrawat
Yaman Kumar Singla
Shashwat Aggarwal
Yifang Yin
R. Shah
Roger Zimmermann
28
3
0
12 Jun 2020
SpotFast Networks with Memory Augmented Lateral Transformers for Lipreading
Peratham Wiriyathammabhum
23
8
0
21 May 2020
Discriminative Multi-modality Speech Recognition
Bo Xu
Cheng Lu
Yandong Guo
Jacob Wang
18
98
0
12 May 2020
Synchronous Bidirectional Learning for Multilingual Lip Reading
Mingshuang Luo
Shuang Yang
Xilin Chen
Zitao Liu
Shiguang Shan
20
15
0
08 May 2020
Does Visual Self-Supervision Improve Learning of Speech Representations for Emotion Recognition?
Abhinav Shukla
Stavros Petridis
M. Pantic
SSL
32
28
0
04 May 2020
How to Teach DNNs to Pay Attention to the Visual Modality in Speech Recognition
George Sterpu
Christian Saam
N. Harte
34
28
0
17 Apr 2020
Mutual Information Maximization for Effective Lip Reading
Xingyuan Zhao
Shuang Yang
Shiguang Shan
Xilin Chen
16
58
0
13 Mar 2020
Pseudo-Convolutional Policy Gradient for Sequence-to-Sequence Lip-Reading
Mingshuang Luo
Shuang Yang
Shiguang Shan
Xilin Chen
19
41
0
09 Mar 2020
Can We Read Speech Beyond the Lips? Rethinking RoI Selection for Deep Visual Speech Recognition
Yuanhang Zhang
Shuang Yang
Jingyun Xiao
Shiguang Shan
Xilin Chen
10
64
0
06 Mar 2020
Audio-Visual Decision Fusion for WFST-based and seq2seq Models
R. Aralikatti
Sharad Roy
Abhinav Thanda
D. Margam
Pujitha Appan Kandala
Tanay Sharma
S. Venkatesan
19
1
0
29 Jan 2020
Lipreading using Temporal Convolutional Networks
Brais Martínez
Pingchuan Ma
Stavros Petridis
M. Pantic
168
239
0
23 Jan 2020
Towards Pose-invariant Lip-Reading
Shiyang Cheng
Pingchuan Ma
Georgios Tzimiropoulos
Stavros Petridis
Adrian Bulat
Jie Shen
M. Pantic
22
26
0
14 Nov 2019
Multi-Grained Spatio-temporal Modeling for Lip-reading
Chenhao Wang
19
51
0
30 Aug 2019
Audio-visual Speech Enhancement Using Conditional Variational Auto-Encoders
M. Sadeghi
Simon Leglaive
Xavier Alameda-Pineda
Laurent Girin
Radu Horaud
DiffM
14
65
0
07 Aug 2019
Video-Driven Speech Reconstruction using Generative Adversarial Networks
Konstantinos Vougioukas
Pingchuan Ma
Stavros Petridis
M. Pantic
GAN
22
49
0
14 Jun 2019
Investigating the Lombard Effect Influence on End-to-End Audio-Visual Speech Recognition
Pingchuan Ma
Stavros Petridis
M. Pantic
AuLLM
33
10
0
05 Jun 2019
Learning Spatio-Temporal Features with Two-Stream Deep 3D CNNs for Lipreading
Xinshuo Weng
Kris M. Kitani
16
71
0
04 May 2019
End-to-End Visual Speech Recognition for Small-Scale Datasets
Stavros Petridis
Yujiang Wang
Pingchuan Ma
Zuwei Li
M. Pantic
AI4TS
VLM
6
35
0
02 Apr 2019
On the Importance of Video Action Recognition for Visual Lipreading
Xinshuo Weng
17
3
0
22 Mar 2019
Audiovisual Speaker Tracking using Nonlinear Dynamical Systems with Dynamic Stream Weights
C. Schymura
D. Kolossa
15
7
0
14 Mar 2019
An Empirical Analysis of Deep Audio-Visual Models for Speech Recognition
Devesh Walawalkar
Yihui He
R. Pillai
28
1
0
21 Dec 2018
Noise-tolerant Audio-visual Online Person Verification using an Attention-based Neural Network Fusion
Suwon Shon
Tae-Hyun Oh
James R. Glass
11
50
0
27 Nov 2018
Modality Attention for End-to-End Audio-visual Speech Recognition
Pan Zhou
Wenwen Yang
Wei Chen
Yanfeng Wang
Jia Jia
24
69
0
13 Nov 2018
Reconstructing Speech Stimuli From Human Auditory Cortex Activity Using a WaveNet Approach
Ran Wang
Yao Wang
A. Flinker
17
7
0
06 Nov 2018
Pushing the boundaries of audiovisual word recognition using Residual Networks and LSTMs
Themos Stafylakis
M. H. Khan
Georgios Tzimiropoulos
VLM
8
59
0
03 Nov 2018
Audio-Visual Speech Recognition With A Hybrid CTC/Attention Architecture
Stavros Petridis
Themos Stafylakis
Pingchuan Ma
Georgios Tzimiropoulos
M. Pantic
14
128
0
28 Sep 2018
Deep Audio-Visual Speech Recognition
Triantafyllos Afouras
Joon Son Chung
A. Senior
Oriol Vinyals
Andrew Zisserman
27
687
0
06 Sep 2018
Attention-based Audio-Visual Fusion for Robust Automatic Speech Recognition
George Sterpu
Christian Saam
N. Harte
39
65
0
05 Sep 2018
Zero-shot keyword spotting for visual speech recognition in-the-wild
Themos Stafylakis
Georgios Tzimiropoulos
27
38
0
23 Jul 2018
Deep Lip Reading: a comparison of models and an online application
Triantafyllos Afouras
Joon Son Chung
Andrew Zisserman
27
118
0
15 Jun 2018
The Conversation: Deep Audio-Visual Speech Enhancement
Triantafyllos Afouras
Joon Son Chung
Andrew Zisserman
13
357
0
11 Apr 2018
Previous
1
2
3
Next