Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1611.05358
Cited By
Lip Reading Sentences in the Wild
16 November 2016
Joon Son Chung
A. Senior
Oriol Vinyals
Andrew Zisserman
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Lip Reading Sentences in the Wild"
34 / 34 papers shown
Title
SwinLip: An Efficient Visual Speech Encoder for Lip Reading Using Swin Transformer
Young-Hu Park
R.-H. Park
Hyung-Min Park
39
0
0
07 May 2025
CoGenAV: Versatile Audio-Visual Representation Learning via Contrastive-Generative Synchronization
Detao Bai
Zhiheng Ma
Xihan Wei
Liefeng Bo
35
0
0
06 May 2025
Development and evaluation of a deep learning algorithm for German word recognition from lip movements
Dinh Nam Pham
Torsten Rahne
24
2
0
22 Apr 2025
Multi-Task Corrupted Prediction for Learning Robust Audio-Visual Speech Representation
Sungnyun Kim
Sungwoo Cho
Sangmin Bae
Kangwook Jang
Se-Young Yun
SSL
68
1
0
23 Jan 2025
FlanEC: Exploring Flan-T5 for Post-ASR Error Correction
Moreno La Quatra
Valerio Mario Salerno
Yu Tsao
Sabato Marco Siniscalchi
76
0
0
22 Jan 2025
Uncovering the Visual Contribution in Audio-Visual Speech Recognition
Zhaofeng Lin
Naomi Harte
68
1
0
20 Jan 2025
Personalized Lip Reading: Adapting to Your Unique Lip Movements with Vision and Language
Jeong Hun Yeo
Chae Won Kim
Hyunjun Kim
Hyeongseop Rha
Seunghee Han
Wen-Huang Cheng
Y. Ro
50
3
0
03 Jan 2025
Reading to Listen at the Cocktail Party: Multi-Modal Speech Separation
Akam Rahimi
Triantafyllos Afouras
Andrew Zisserman
32
28
0
02 Jan 2025
GLCF: A Global-Local Multimodal Coherence Analysis Framework for Talking Face Generation Detection
Xiaocan Chen
Qilin Yin
Jiarui Liu
Wei Lu
Xiangyang Luo
Jiantao Zhou
CVBM
70
0
0
18 Dec 2024
Circumventing shortcuts in audio-visual deepfake detection datasets with unsupervised learning
Dragos-Alexandru Boldisor
Stefan Smeu
Dan Oneaţă
Elisabeta Oneata
85
1
0
29 Nov 2024
LaDTalk: Latent Denoising for Synthesizing Talking Head Videos with High Frequency Details
Jian Yang
Xukun Wang
Wentao Wang
Guoming Li
Qihang Fang
Ruihong Yuan
Tianyang Wang
Jason Zhaoxin Fan
Yeying Jin
Zhaoxin Fan
VGen
28
1
0
01 Oct 2024
Multilingual Audio-Visual Speech Recognition with Hybrid CTC/RNN-T Fast Conformer
Maxime Burchi
Krishna C. Puvvada
Jagadeesh Balam
Boris Ginsburg
Radu Timofte
20
7
0
14 Mar 2024
Do VSR Models Generalize Beyond LRS3?
Y. A. D. Djilali
Sanath Narayan
Eustache Le Bihan
Haithem Boussaid
Ebtesam Almazrouei
Merouane Debbah
14
4
0
23 Nov 2023
AI-Generated Content (AIGC) for Various Data Modalities: A Survey
Lin Geng Foo
Hossein Rahmani
J. Liu
36
30
0
27 Aug 2023
Learning to Dub Movies via Hierarchical Prosody Models
Gaoxiang Cong
Liang Li
Yuankai Qi
Zhengjun Zha
Qi Wu
Wen-yu Wang
Bin Jiang
Ming Yang
Qin Huang
40
23
0
08 Dec 2022
Best of Both Worlds: Multi-task Audio-Visual Automatic Speech Recognition and Active Speaker Detection
Otavio Braga
Olivier Siohan
CVBM
13
8
0
10 May 2022
LipSound2: Self-Supervised Pre-Training for Lip-to-Speech Reconstruction and Lip Reading
Leyuan Qu
C. Weber
S. Wermter
15
22
0
09 Dec 2021
V2C: Visual Voice Cloning
Qi Chen
Yuanqing Li
Yuankai Qi
Jiaqiu Zhou
Mingkui Tan
Qi Wu
VGen
6
23
0
25 Nov 2021
Large-vocabulary Audio-visual Speech Recognition in Noisy Environments
Wentao Yu
Steffen Zeiler
D. Kolossa
41
3
0
10 Sep 2021
Seeing wake words: Audio-visual Keyword Spotting
Liliane Momeni
Triantafyllos Afouras
Themos Stafylakis
Samuel Albanie
Andrew Zisserman
20
43
0
02 Sep 2020
How to Teach DNNs to Pay Attention to the Visual Modality in Speech Recognition
George Sterpu
Christian Saam
N. Harte
8
28
0
17 Apr 2020
Audio-visual Recognition of Overlapped speech for the LRS2 dataset
Jianwei Yu
Shi-Xiong Zhang
Jian Wu
Shahram Ghorbani
Bo Wu
Shiyin Kang
Shansong Liu
Xunying Liu
H. Meng
Dong Yu
11
72
0
06 Jan 2020
Multimodal Machine Translation through Visuals and Speech
U. Sulubacak
Ozan Caglayan
Stig-Arne Gronroos
Aku Rouhe
Desmond Elliott
Lucia Specia
Jörg Tiedemann
21
71
0
28 Nov 2019
Lipper: Synthesizing Thy Speech using Multi-View Lipreading
Yaman Kumar Singla
Rohit Jain
Khwaja Mohd. Salik
R. Shah
Yifang Yin
Roger Zimmermann
30
38
0
28 Jun 2019
DeepFakes: a New Threat to Face Recognition? Assessment and Detection
Pavel Korshunov
S´ebastien Marcel
PICV
CVBM
26
584
0
20 Dec 2018
Modality Attention for End-to-End Audio-visual Speech Recognition
Pan Zhou
Wenwen Yang
Wei Chen
Yanfeng Wang
Jia Jia
19
69
0
13 Nov 2018
Attention-based Audio-Visual Fusion for Robust Automatic Speech Recognition
George Sterpu
Christian Saam
N. Harte
19
65
0
05 Sep 2018
Lip-Reading Driven Deep Learning Approach for Speech Enhancement
Ahsan Adeel
M. Gogate
Amir Hussain
W. Whitmer
6
61
0
31 Jul 2018
Large-Scale Visual Speech Recognition
Brendan Shillingford
Yannis Assael
Matthew W. Hoffman
T. Paine
Cían Hughes
...
Marie Mulville
Ben Coppin
Ben Laurie
A. Senior
Nando de Freitas
19
152
0
13 Jul 2018
Remote Detection of Idling Cars Using Infrared Imaging and Deep Networks
M. Bastan
Kim-Hui Yap
Lap-Pui Chau
14
6
0
28 Apr 2018
Visual-Only Recognition of Normal, Whispered and Silent Speech
Stavros Petridis
Jie Shen
Doruk Cetin
M. Pantic
13
55
0
18 Feb 2018
Audio to Body Dynamics
Eli Shlizerman
Lucio Dery
Hayden Schoen
Ira Kemelmacher-Shlizerman
VGen
26
152
0
19 Dec 2017
Improved Speech Reconstruction from Silent Video
Ariel Ephrat
Tavi Halperin
Shmuel Peleg
16
89
0
01 Aug 2017
Streaming Architecture for Large-Scale Quantized Neural Networks on an FPGA-Based Dataflow Platform
Chaim Baskin
Natan Liss
Evgenii Zheltonozhskii
A. Bronstein
A. Mendelson
GNN
MQ
10
34
0
31 Jul 2017
1