ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1703.04105
  4. Cited By
Combining Residual Networks with LSTMs for Lipreading

Combining Residual Networks with LSTMs for Lipreading

12 March 2017
Themos Stafylakis
Georgios Tzimiropoulos
    VLM
ArXivPDFHTML

Papers citing "Combining Residual Networks with LSTMs for Lipreading"

50 / 58 papers shown
Title
SwinLip: An Efficient Visual Speech Encoder for Lip Reading Using Swin Transformer
SwinLip: An Efficient Visual Speech Encoder for Lip Reading Using Swin Transformer
Young-Hu Park
R.-H. Park
Hyung-Min Park
49
0
0
07 May 2025
Personalized Lip Reading: Adapting to Your Unique Lip Movements with Vision and Language
Personalized Lip Reading: Adapting to Your Unique Lip Movements with Vision and Language
Jeong Hun Yeo
Chae Won Kim
Hyunjun Kim
Hyeongseop Rha
Seunghee Han
Wen-Huang Cheng
Y. Ro
59
3
0
03 Jan 2025
Reading to Listen at the Cocktail Party: Multi-Modal Speech Separation
Reading to Listen at the Cocktail Party: Multi-Modal Speech Separation
Akam Rahimi
Triantafyllos Afouras
Andrew Zisserman
40
28
0
02 Jan 2025
Lip Reading for Low-resource Languages by Learning and Combining General
  Speech Knowledge and Language-specific Knowledge
Lip Reading for Low-resource Languages by Learning and Combining General Speech Knowledge and Language-specific Knowledge
Minsu Kim
Jeong Hun Yeo
J. Choi
Y. Ro
34
16
0
18 Aug 2023
Audio-visual video-to-speech synthesis with synthesized input audio
Audio-visual video-to-speech synthesis with synthesized input audio
Triantafyllos Kefalas
Yannis Panagakis
M. Pantic
VGen
DiffM
38
1
0
31 Jul 2023
AV-TranSpeech: Audio-Visual Robust Speech-to-Speech Translation
AV-TranSpeech: Audio-Visual Robust Speech-to-Speech Translation
Rongjie Huang
Huadai Liu
Xize Cheng
Yi Ren
Lin Li
...
Jinzheng He
Lichao Zhang
Jinglin Liu
Xiaoyue Yin
Zhou Zhao
72
8
0
24 May 2023
Word-level Persian Lipreading Dataset
Word-level Persian Lipreading Dataset
J. Peymanfard
Ali Lashini
Samin Heydarian
Hossein Zeinali
N. Mozayani
30
5
0
08 Apr 2023
SynthVSR: Scaling Up Visual Speech Recognition With Synthetic
  Supervision
SynthVSR: Scaling Up Visual Speech Recognition With Synthetic Supervision
Xubo Liu
Egor Lakomkin
Konstantinos Vougioukas
Pingchuan Ma
Honglie Chen
...
Niko Moritz
J. Kolár
Stavros Petridis
M. Pantic
Christian Fuegen
49
19
0
30 Mar 2023
Seeing What You Said: Talking Face Generation Guided by a Lip Reading
  Expert
Seeing What You Said: Talking Face Generation Guided by a Lip Reading Expert
Jiadong Wang
Xinyuan Qian
Malu Zhang
R. Tan
Haizhou Li
EGVM
22
93
0
29 Mar 2023
Relaxed Attention for Transformer Models
Relaxed Attention for Transformer Models
Timo Lohrenz
Björn Möller
Zhengyang Li
Tim Fingscheidt
KELM
29
11
0
20 Sep 2022
Learning Long-Term Spatial-Temporal Graphs for Active Speaker Detection
Learning Long-Term Spatial-Temporal Graphs for Active Speaker Detection
Kyle Min
Sourya Roy
Subarna Tripathi
T. Guha
Somdeb Majumdar
24
36
0
15 Jul 2022
VisageSynTalk: Unseen Speaker Video-to-Speech Synthesis via
  Speech-Visage Feature Selection
VisageSynTalk: Unseen Speaker Video-to-Speech Synthesis via Speech-Visage Feature Selection
Joanna Hong
Minsu Kim
Y. Ro
CVBM
DiffM
36
8
0
15 Jun 2022
Learning Speaker-specific Lip-to-Speech Generation
Learning Speaker-specific Lip-to-Speech Generation
Munender Varshney
Ravindra Yadav
Vinay P. Namboodiri
R. Hegde
18
7
0
04 Jun 2022
Is Lip Region-of-Interest Sufficient for Lipreading?
Is Lip Region-of-Interest Sufficient for Lipreading?
Jing-Xuan Zhang
Genshun Wan
Jia-Yu Pan
24
6
0
28 May 2022
Multi-modality Associative Bridging through Memory: Speech Sound
  Recollected from Face Video
Multi-modality Associative Bridging through Memory: Speech Sound Recollected from Face Video
Minsu Kim
Joanna Hong
Se Jin Park
Yong Man Ro
CVBM
25
40
0
04 Apr 2022
A Multimodal German Dataset for Automatic Lip Reading Systems and
  Transfer Learning
A Multimodal German Dataset for Automatic Lip Reading Systems and Transfer Learning
Gerald Schwiebert
C. Weber
Leyuan Qu
Henrique Siqueira
S. Wermter
24
11
0
27 Feb 2022
Visual Speech Recognition for Multiple Languages in the Wild
Visual Speech Recognition for Multiple Languages in the Wild
Pingchuan Ma
Stavros Petridis
M. Pantic
VLM
125
144
0
26 Feb 2022
LipSound2: Self-Supervised Pre-Training for Lip-to-Speech Reconstruction
  and Lip Reading
LipSound2: Self-Supervised Pre-Training for Lip-to-Speech Reconstruction and Lip Reading
Leyuan Qu
C. Weber
S. Wermter
38
23
0
09 Dec 2021
Neural Dubber: Dubbing for Videos According to Scripts
Neural Dubber: Dubbing for Videos According to Scripts
Chenxu Hu
Qiao Tian
Tingle Li
Yuping Wang
Yuxuan Wang
Hang Zhao
DiffM
VGen
36
39
0
15 Oct 2021
Sub-word Level Lip Reading With Visual Attention
Sub-word Level Lip Reading With Visual Attention
Prajwal K R
Triantafyllos Afouras
Andrew Zisserman
12
92
0
14 Oct 2021
USEV: Universal Speaker Extraction with Visual Cue
USEV: Universal Speaker Extraction with Visual Cue
Zexu Pan
Meng Ge
Haizhou Li
34
41
0
30 Sep 2021
Large-vocabulary Audio-visual Speech Recognition in Noisy Environments
Large-vocabulary Audio-visual Speech Recognition in Noisy Environments
Wentao Yu
Steffen Zeiler
D. Kolossa
64
3
0
10 Sep 2021
LiRA: Learning Visual Speech Representations from Audio through
  Self-supervision
LiRA: Learning Visual Speech Representations from Audio through Self-supervision
Pingchuan Ma
Rodrigo Mira
Stavros Petridis
Björn W. Schuller
M. Pantic
SSL
24
53
0
16 Jun 2021
End-to-End Video-To-Speech Synthesis using Generative Adversarial
  Networks
End-to-End Video-To-Speech Synthesis using Generative Adversarial Networks
Rodrigo Mira
Konstantinos Vougioukas
Pingchuan Ma
Stavros Petridis
Björn W. Schuller
M. Pantic
24
43
0
27 Apr 2021
Fusing information streams in end-to-end audio-visual speech recognition
Fusing information streams in end-to-end audio-visual speech recognition
Wentao Yu
Steffen Zeiler
D. Kolossa
81
12
0
19 Apr 2021
Read and Attend: Temporal Localisation in Sign Language Videos
Read and Attend: Temporal Localisation in Sign Language Videos
Gül Varol
Liliane Momeni
Samuel Albanie
Triantafyllos Afouras
Andrew Zisserman
SLR
24
40
0
30 Mar 2021
Learn an Effective Lip Reading Model without Pains
Learn an Effective Lip Reading Model without Pains
Dalu Feng
Shuang Yang
Shiguang Shan
Xilin Chen
30
61
0
15 Nov 2020
Lip-reading with Densely Connected Temporal Convolutional Networks
Lip-reading with Densely Connected Temporal Convolutional Networks
Pingchuan Ma
Yujiang Wang
Jie Shen
Stavros Petridis
M. Pantic
8
55
0
29 Sep 2020
Correlating Subword Articulation with Lip Shapes for Embedding Aware
  Audio-Visual Speech Enhancement
Correlating Subword Articulation with Lip Shapes for Embedding Aware Audio-Visual Speech Enhancement
Hang Chen
Jun Du
Yu Hu
Lirong Dai
Baocai Yin
Chin-Hui Lee
25
19
0
21 Sep 2020
Seeing wake words: Audio-visual Keyword Spotting
Seeing wake words: Audio-visual Keyword Spotting
Liliane Momeni
Triantafyllos Afouras
Themos Stafylakis
Samuel Albanie
Andrew Zisserman
44
43
0
02 Sep 2020
Attentive Fusion Enhanced Audio-Visual Encoding for Transformer Based
  Robust Speech Recognition
Attentive Fusion Enhanced Audio-Visual Encoding for Transformer Based Robust Speech Recognition
L. Wei
Jie Zhang
Junfeng Hou
Lirong Dai
16
14
0
06 Aug 2020
Multimodal Integration for Large-Vocabulary Audio-Visual Speech
  Recognition
Multimodal Integration for Large-Vocabulary Audio-Visual Speech Recognition
Wentao Yu
Steffen Zeiler
D. Kolossa
30
10
0
28 Jul 2020
Towards Practical Lipreading with Distilled and Efficient Models
Towards Practical Lipreading with Distilled and Efficient Models
Pingchuan Ma
Brais Martínez
Stavros Petridis
M. Pantic
26
95
0
13 Jul 2020
"Notic My Speech" -- Blending Speech Patterns With Multimedia
"Notic My Speech" -- Blending Speech Patterns With Multimedia
Dhruva Sahrawat
Yaman Kumar Singla
Shashwat Aggarwal
Yifang Yin
R. Shah
Roger Zimmermann
28
3
0
12 Jun 2020
SpotFast Networks with Memory Augmented Lateral Transformers for
  Lipreading
SpotFast Networks with Memory Augmented Lateral Transformers for Lipreading
Peratham Wiriyathammabhum
18
8
0
21 May 2020
Discriminative Multi-modality Speech Recognition
Discriminative Multi-modality Speech Recognition
Bo Xu
Cheng Lu
Yandong Guo
Jacob Wang
18
98
0
12 May 2020
Neural Spatio-Temporal Beamformer for Target Speech Separation
Neural Spatio-Temporal Beamformer for Target Speech Separation
Yong-mei Xu
Meng Yu
Shi-Xiong Zhang
Lianwu Chen
Chao Weng
Jianming Liu
Dong Yu
20
41
0
08 May 2020
How to Teach DNNs to Pay Attention to the Visual Modality in Speech
  Recognition
How to Teach DNNs to Pay Attention to the Visual Modality in Speech Recognition
George Sterpu
Christian Saam
N. Harte
34
28
0
17 Apr 2020
Vocoder-Based Speech Synthesis from Silent Videos
Vocoder-Based Speech Synthesis from Silent Videos
Daniel Michelsanti
Olga Slizovskaia
G. Haro
Emilia Gómez
Zheng-Hua Tan
Jesper Jensen
31
31
0
06 Apr 2020
Mutual Information Maximization for Effective Lip Reading
Mutual Information Maximization for Effective Lip Reading
Xingyuan Zhao
Shuang Yang
Shiguang Shan
Xilin Chen
16
58
0
13 Mar 2020
Deformation Flow Based Two-Stream Network for Lip Reading
Deformation Flow Based Two-Stream Network for Lip Reading
Jingyun Xiao
Shuang Yang
Yuanhang Zhang
Shiguang Shan
Xilin Chen
17
64
0
12 Mar 2020
Pseudo-Convolutional Policy Gradient for Sequence-to-Sequence
  Lip-Reading
Pseudo-Convolutional Policy Gradient for Sequence-to-Sequence Lip-Reading
Mingshuang Luo
Shuang Yang
Shiguang Shan
Xilin Chen
19
41
0
09 Mar 2020
Can We Read Speech Beyond the Lips? Rethinking RoI Selection for Deep
  Visual Speech Recognition
Can We Read Speech Beyond the Lips? Rethinking RoI Selection for Deep Visual Speech Recognition
Yuanhang Zhang
Shuang Yang
Jingyun Xiao
Shiguang Shan
Xilin Chen
10
64
0
06 Mar 2020
Deep Audio-Visual Learning: A Survey
Deep Audio-Visual Learning: A Survey
Hao Zhu
Mandi Luo
Rui Wang
A. Zheng
Ran He
31
156
0
14 Jan 2020
MMTM: Multimodal Transfer Module for CNN Fusion
MMTM: Multimodal Transfer Module for CNN Fusion
Hamid Reza Vaezi Joze
Amirreza Shaban
Michael L. Iuzzolino
K. Koishida
18
277
0
20 Nov 2019
MobiVSR: A Visual Speech Recognition Solution for Mobile Devices
MobiVSR: A Visual Speech Recognition Solution for Mobile Devices
Nilay Shrivastava
Astitwa Saxena
Yaman Kumar Singla
Preeti Kaur
Debanjan Mahata
R. Shah
19
3
0
10 May 2019
Learning Spatio-Temporal Features with Two-Stream Deep 3D CNNs for
  Lipreading
Learning Spatio-Temporal Features with Two-Stream Deep 3D CNNs for Lipreading
Xinshuo Weng
Kris M. Kitani
16
71
0
04 May 2019
Time Domain Audio Visual Speech Separation
Time Domain Audio Visual Speech Separation
Jian Wu
Yong-mei Xu
Shi-Xiong Zhang
Lianwu Chen
Meng Yu
Lei Xie
Dong Yu
22
114
0
07 Apr 2019
3D Feature Pyramid Attention Module for Robust Visual Speech Recognition
3D Feature Pyramid Attention Module for Robust Visual Speech Recognition
Jingyun Xiao
19
2
0
15 Oct 2018
Deep Audio-Visual Speech Recognition
Deep Audio-Visual Speech Recognition
Triantafyllos Afouras
Joon Son Chung
A. Senior
Oriol Vinyals
Andrew Zisserman
22
687
0
06 Sep 2018
12
Next