ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1802.06424
  4. Cited By
End-to-end Audiovisual Speech Recognition

End-to-end Audiovisual Speech Recognition

18 February 2018
Stavros Petridis
Themos Stafylakis
Pingchuan Ma
Feipeng Cai
Georgios Tzimiropoulos
M. Pantic
ArXivPDFHTML

Papers citing "End-to-end Audiovisual Speech Recognition"

50 / 101 papers shown
Title
Multi-modality Associative Bridging through Memory: Speech Sound
  Recollected from Face Video
Multi-modality Associative Bridging through Memory: Speech Sound Recollected from Face Video
Minsu Kim
Joanna Hong
Se Jin Park
Yong Man Ro
CVBM
25
40
0
04 Apr 2022
Learning Contextually Fused Audio-visual Representations for
  Audio-visual Speech Recognition
Learning Contextually Fused Audio-visual Representations for Audio-visual Speech Recognition
Zitian Zhang
Jie Zhang
Jian-Shu Zhang
Ming Wu
Xin Fang
Lirong Dai
SSL
41
10
0
15 Feb 2022
CI-AVSR: A Cantonese Audio-Visual Speech Dataset for In-car Command
  Recognition
CI-AVSR: A Cantonese Audio-Visual Speech Dataset for In-car Command Recognition
Wenliang Dai
Samuel Cahyawijaya
Tiezheng Yu
Elham J. Barezi
Peng Xu
...
Genta Indra Winata
Qifeng Chen
Xiaojuan Ma
Bertram E. Shi
Pascale Fung
41
11
0
11 Jan 2022
Recent Advances in End-to-End Automatic Speech Recognition
Recent Advances in End-to-End Automatic Speech Recognition
Jinyu Li
VLM
35
363
0
02 Nov 2021
Neural Dubber: Dubbing for Videos According to Scripts
Neural Dubber: Dubbing for Videos According to Scripts
Chenxu Hu
Qiao Tian
Tingle Li
Yuping Wang
Yuxuan Wang
Hang Zhao
DiffM
VGen
36
39
0
15 Oct 2021
Advances and Challenges in Deep Lip Reading
Advances and Challenges in Deep Lip Reading
Marzieh Oghbaie
Arian Sabaghi
Kooshan Hashemifard
Mohammad Akbari
VLM
30
15
0
15 Oct 2021
Sub-word Level Lip Reading With Visual Attention
Sub-word Level Lip Reading With Visual Attention
Prajwal K R
Triantafyllos Afouras
Andrew Zisserman
17
92
0
14 Oct 2021
Audiovisual Singing Voice Separation
Audiovisual Singing Voice Separation
Bochen Li
Yuxuan Wang
Z. Duan
29
6
0
01 Jul 2021
End-to-End Video-To-Speech Synthesis using Generative Adversarial
  Networks
End-to-End Video-To-Speech Synthesis using Generative Adversarial Networks
Rodrigo Mira
Konstantinos Vougioukas
Pingchuan Ma
Stavros Petridis
Björn W. Schuller
M. Pantic
29
43
0
27 Apr 2021
Improving state-of-the-art in Detecting Student Engagement with Resnet
  and TCN Hybrid Network
Improving state-of-the-art in Detecting Student Engagement with Resnet and TCN Hybrid Network
Ali Abedi
Shehroz S. Khan
19
47
0
20 Apr 2021
Exploring Deep Learning for Joint Audio-Visual Lip Biometrics
Exploring Deep Learning for Joint Audio-Visual Lip Biometrics
Meng Liu
Longbiao Wang
Kong Aik Lee
Hanyi Zhang
Chang Zeng
J. Dang
HAI
30
12
0
17 Apr 2021
End-to-end Audio-visual Speech Recognition with Conformers
End-to-end Audio-visual Speech Recognition with Conformers
Pingchuan Ma
Stavros Petridis
M. Pantic
84
225
0
12 Feb 2021
Lip-reading with Hierarchical Pyramidal Convolution and Self-Attention
Lip-reading with Hierarchical Pyramidal Convolution and Self-Attention
Hang Chen
Jun Du
Yu Hu
Lirong Dai
Chin-Hui Lee
Baocai Yin
14
6
0
28 Dec 2020
Lip-reading with Densely Connected Temporal Convolutional Networks
Lip-reading with Densely Connected Temporal Convolutional Networks
Pingchuan Ma
Yujiang Wang
Jie Shen
Stavros Petridis
M. Pantic
16
55
0
29 Sep 2020
Correlating Subword Articulation with Lip Shapes for Embedding Aware
  Audio-Visual Speech Enhancement
Correlating Subword Articulation with Lip Shapes for Embedding Aware Audio-Visual Speech Enhancement
Hang Chen
Jun Du
Yu Hu
Lirong Dai
Baocai Yin
Chin-Hui Lee
31
19
0
21 Sep 2020
WSRNet: Joint Spotting and Recognition of Handwritten Words
WSRNet: Joint Spotting and Recognition of Handwritten Words
George Retsinas
Giorgos Sfikas
Petros Maragos
6
2
0
17 Aug 2020
Attentive Fusion Enhanced Audio-Visual Encoding for Transformer Based
  Robust Speech Recognition
Attentive Fusion Enhanced Audio-Visual Encoding for Transformer Based Robust Speech Recognition
L. Wei
Jie Zhang
Junfeng Hou
Lirong Dai
16
14
0
06 Aug 2020
Towards Practical Lipreading with Distilled and Efficient Models
Towards Practical Lipreading with Distilled and Efficient Models
Pingchuan Ma
Brais Martínez
Stavros Petridis
M. Pantic
26
95
0
13 Jul 2020
Jointly Modeling Motion and Appearance Cues for Robust RGB-T Tracking
Jointly Modeling Motion and Appearance Cues for Robust RGB-T Tracking
Pengyu Zhang
Jie Zhao
Dong Wang
Huchuan Lu
Xiaoyun Yang
40
138
0
04 Jul 2020
"Notic My Speech" -- Blending Speech Patterns With Multimedia
"Notic My Speech" -- Blending Speech Patterns With Multimedia
Dhruva Sahrawat
Yaman Kumar Singla
Shashwat Aggarwal
Yifang Yin
R. Shah
Roger Zimmermann
28
3
0
12 Jun 2020
SpotFast Networks with Memory Augmented Lateral Transformers for
  Lipreading
SpotFast Networks with Memory Augmented Lateral Transformers for Lipreading
Peratham Wiriyathammabhum
23
8
0
21 May 2020
Discriminative Multi-modality Speech Recognition
Discriminative Multi-modality Speech Recognition
Bo Xu
Cheng Lu
Yandong Guo
Jacob Wang
18
98
0
12 May 2020
Synchronous Bidirectional Learning for Multilingual Lip Reading
Synchronous Bidirectional Learning for Multilingual Lip Reading
Mingshuang Luo
Shuang Yang
Xilin Chen
Zitao Liu
Shiguang Shan
20
15
0
08 May 2020
Does Visual Self-Supervision Improve Learning of Speech Representations
  for Emotion Recognition?
Does Visual Self-Supervision Improve Learning of Speech Representations for Emotion Recognition?
Abhinav Shukla
Stavros Petridis
M. Pantic
SSL
32
28
0
04 May 2020
How to Teach DNNs to Pay Attention to the Visual Modality in Speech
  Recognition
How to Teach DNNs to Pay Attention to the Visual Modality in Speech Recognition
George Sterpu
Christian Saam
N. Harte
34
28
0
17 Apr 2020
Mutual Information Maximization for Effective Lip Reading
Mutual Information Maximization for Effective Lip Reading
Xingyuan Zhao
Shuang Yang
Shiguang Shan
Xilin Chen
16
58
0
13 Mar 2020
Pseudo-Convolutional Policy Gradient for Sequence-to-Sequence
  Lip-Reading
Pseudo-Convolutional Policy Gradient for Sequence-to-Sequence Lip-Reading
Mingshuang Luo
Shuang Yang
Shiguang Shan
Xilin Chen
19
41
0
09 Mar 2020
Can We Read Speech Beyond the Lips? Rethinking RoI Selection for Deep
  Visual Speech Recognition
Can We Read Speech Beyond the Lips? Rethinking RoI Selection for Deep Visual Speech Recognition
Yuanhang Zhang
Shuang Yang
Jingyun Xiao
Shiguang Shan
Xilin Chen
10
64
0
06 Mar 2020
Audio-Visual Decision Fusion for WFST-based and seq2seq Models
Audio-Visual Decision Fusion for WFST-based and seq2seq Models
R. Aralikatti
Sharad Roy
Abhinav Thanda
D. Margam
Pujitha Appan Kandala
Tanay Sharma
S. Venkatesan
19
1
0
29 Jan 2020
Lipreading using Temporal Convolutional Networks
Lipreading using Temporal Convolutional Networks
Brais Martínez
Pingchuan Ma
Stavros Petridis
M. Pantic
168
239
0
23 Jan 2020
Towards Pose-invariant Lip-Reading
Towards Pose-invariant Lip-Reading
Shiyang Cheng
Pingchuan Ma
Georgios Tzimiropoulos
Stavros Petridis
Adrian Bulat
Jie Shen
M. Pantic
22
26
0
14 Nov 2019
Multi-Grained Spatio-temporal Modeling for Lip-reading
Multi-Grained Spatio-temporal Modeling for Lip-reading
Chenhao Wang
19
51
0
30 Aug 2019
Audio-visual Speech Enhancement Using Conditional Variational
  Auto-Encoders
Audio-visual Speech Enhancement Using Conditional Variational Auto-Encoders
M. Sadeghi
Simon Leglaive
Xavier Alameda-Pineda
Laurent Girin
Radu Horaud
DiffM
14
65
0
07 Aug 2019
Video-Driven Speech Reconstruction using Generative Adversarial Networks
Video-Driven Speech Reconstruction using Generative Adversarial Networks
Konstantinos Vougioukas
Pingchuan Ma
Stavros Petridis
M. Pantic
GAN
22
49
0
14 Jun 2019
Investigating the Lombard Effect Influence on End-to-End Audio-Visual
  Speech Recognition
Investigating the Lombard Effect Influence on End-to-End Audio-Visual Speech Recognition
Pingchuan Ma
Stavros Petridis
M. Pantic
AuLLM
33
10
0
05 Jun 2019
Learning Spatio-Temporal Features with Two-Stream Deep 3D CNNs for
  Lipreading
Learning Spatio-Temporal Features with Two-Stream Deep 3D CNNs for Lipreading
Xinshuo Weng
Kris M. Kitani
16
71
0
04 May 2019
End-to-End Visual Speech Recognition for Small-Scale Datasets
End-to-End Visual Speech Recognition for Small-Scale Datasets
Stavros Petridis
Yujiang Wang
Pingchuan Ma
Zuwei Li
M. Pantic
AI4TS
VLM
6
35
0
02 Apr 2019
On the Importance of Video Action Recognition for Visual Lipreading
Xinshuo Weng
17
3
0
22 Mar 2019
Audiovisual Speaker Tracking using Nonlinear Dynamical Systems with
  Dynamic Stream Weights
Audiovisual Speaker Tracking using Nonlinear Dynamical Systems with Dynamic Stream Weights
C. Schymura
D. Kolossa
15
7
0
14 Mar 2019
An Empirical Analysis of Deep Audio-Visual Models for Speech Recognition
An Empirical Analysis of Deep Audio-Visual Models for Speech Recognition
Devesh Walawalkar
Yihui He
R. Pillai
28
1
0
21 Dec 2018
Noise-tolerant Audio-visual Online Person Verification using an
  Attention-based Neural Network Fusion
Noise-tolerant Audio-visual Online Person Verification using an Attention-based Neural Network Fusion
Suwon Shon
Tae-Hyun Oh
James R. Glass
11
50
0
27 Nov 2018
Modality Attention for End-to-End Audio-visual Speech Recognition
Modality Attention for End-to-End Audio-visual Speech Recognition
Pan Zhou
Wenwen Yang
Wei Chen
Yanfeng Wang
Jia Jia
24
69
0
13 Nov 2018
Reconstructing Speech Stimuli From Human Auditory Cortex Activity Using
  a WaveNet Approach
Reconstructing Speech Stimuli From Human Auditory Cortex Activity Using a WaveNet Approach
Ran Wang
Yao Wang
A. Flinker
17
7
0
06 Nov 2018
Pushing the boundaries of audiovisual word recognition using Residual
  Networks and LSTMs
Pushing the boundaries of audiovisual word recognition using Residual Networks and LSTMs
Themos Stafylakis
M. H. Khan
Georgios Tzimiropoulos
VLM
8
59
0
03 Nov 2018
Audio-Visual Speech Recognition With A Hybrid CTC/Attention Architecture
Audio-Visual Speech Recognition With A Hybrid CTC/Attention Architecture
Stavros Petridis
Themos Stafylakis
Pingchuan Ma
Georgios Tzimiropoulos
M. Pantic
14
128
0
28 Sep 2018
Deep Audio-Visual Speech Recognition
Deep Audio-Visual Speech Recognition
Triantafyllos Afouras
Joon Son Chung
A. Senior
Oriol Vinyals
Andrew Zisserman
27
687
0
06 Sep 2018
Attention-based Audio-Visual Fusion for Robust Automatic Speech
  Recognition
Attention-based Audio-Visual Fusion for Robust Automatic Speech Recognition
George Sterpu
Christian Saam
N. Harte
39
65
0
05 Sep 2018
Zero-shot keyword spotting for visual speech recognition in-the-wild
Zero-shot keyword spotting for visual speech recognition in-the-wild
Themos Stafylakis
Georgios Tzimiropoulos
27
38
0
23 Jul 2018
Deep Lip Reading: a comparison of models and an online application
Deep Lip Reading: a comparison of models and an online application
Triantafyllos Afouras
Joon Son Chung
Andrew Zisserman
27
118
0
15 Jun 2018
The Conversation: Deep Audio-Visual Speech Enhancement
The Conversation: Deep Audio-Visual Speech Enhancement
Triantafyllos Afouras
Joon Son Chung
Andrew Zisserman
13
357
0
11 Apr 2018
Previous
123
Next