ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2007.06504
  4. Cited By
Towards Practical Lipreading with Distilled and Efficient Models

Towards Practical Lipreading with Distilled and Efficient Models

13 July 2020
Pingchuan Ma
Brais Martínez
Stavros Petridis
M. Pantic
ArXivPDFHTML

Papers citing "Towards Practical Lipreading with Distilled and Efficient Models"

47 / 47 papers shown
Title
SwinLip: An Efficient Visual Speech Encoder for Lip Reading Using Swin Transformer
SwinLip: An Efficient Visual Speech Encoder for Lip Reading Using Swin Transformer
Young-Hu Park
R.-H. Park
Hyung-Min Park
49
0
0
07 May 2025
FlowDubber: Movie Dubbing with LLM-based Semantic-aware Learning and Flow Matching based Voice Enhancing
FlowDubber: Movie Dubbing with LLM-based Semantic-aware Learning and Flow Matching based Voice Enhancing
Gaoxiang Cong
Liang-Sheng Li
Jiadong Pan
Zhedong Zhang
Amin Beheshti
A. Hengel
Yuankai Qi
Qingming Huang
129
0
0
02 May 2025
LipGen: Viseme-Guided Lip Video Generation for Enhancing Visual Speech Recognition
LipGen: Viseme-Guided Lip Video Generation for Enhancing Visual Speech Recognition
Bowen Hao
Dongliang Zhou
Xiaojie Li
Xingyu Zhang
Liang Xie
Jianlong Wu
Erwei Yin
34
1
0
08 Jan 2025
EmoDubber: Towards High Quality and Emotion Controllable Movie Dubbing
EmoDubber: Towards High Quality and Emotion Controllable Movie Dubbing
Gaoxiang Cong
Jiadong Pan
Liang-Sheng Li
Yuankai Qi
Yuxin Peng
A. Hengel
Jian Yang
Qingming Huang
90
6
0
12 Dec 2024
RAL:Redundancy-Aware Lipreading Model Based on Differential Learning
  with Symmetric Views
RAL:Redundancy-Aware Lipreading Model Based on Differential Learning with Symmetric Views
Zejun gu
Junxia jiang
25
0
0
09 Sep 2024
Tailored Design of Audio-Visual Speech Recognition Models using Branchformers
Tailored Design of Audio-Visual Speech Recognition Models using Branchformers
David Gimeno-Gómez
Carlos David Martínez Hinarejos
86
2
0
09 Jul 2024
MSRS: Training Multimodal Speech Recognition Models from Scratch with
  Sparse Mask Optimization
MSRS: Training Multimodal Speech Recognition Models from Scratch with Sparse Mask Optimization
Adriana Fernandez-Lopez
Honglie Chen
Pingchuan Ma
Lu Yin
Q. Xiao
Stavros Petridis
Shiwei Liu
Maja Pantic
46
2
0
25 Jun 2024
SyncVSR: Data-Efficient Visual Speech Recognition with End-to-End
  Crossmodal Audio Token Synchronization
SyncVSR: Data-Efficient Visual Speech Recognition with End-to-End Crossmodal Audio Token Synchronization
Young Jin Ahn
Jungwoo Park
Sangha Park
Jonghyun Choi
Kee-Eung Kim
34
7
0
18 Jun 2024
A New Perspective on Smiling and Laughter Detection: Intensity Levels
  Matter
A New Perspective on Smiling and Laughter Detection: Intensity Levels Matter
Hugo Bohy
Kevin El Haddad
Thierry Dutoit
38
6
0
04 Mar 2024
JEP-KD: Joint-Embedding Predictive Architecture Based Knowledge
  Distillation for Visual Speech Recognition
JEP-KD: Joint-Embedding Predictive Architecture Based Knowledge Distillation for Visual Speech Recognition
Chang Sun
Hong Yang
Bo Qin
VLM
27
1
0
04 Mar 2024
Where Visual Speech Meets Language: VSP-LLM Framework for Efficient and
  Context-Aware Visual Speech Processing
Where Visual Speech Meets Language: VSP-LLM Framework for Efficient and Context-Aware Visual Speech Processing
Jeong Hun Yeo
Seunghee Han
Minsu Kim
Y. Ro
48
11
0
23 Feb 2024
StyleDubber: Towards Multi-Scale Style Learning for Movie Dubbing
StyleDubber: Towards Multi-Scale Style Learning for Movie Dubbing
Gaoxiang Cong
Yuankai Qi
Liang-Sheng Li
Amin Beheshti
Zhedong Zhang
A. Hengel
Ming-Hsuan Yang
Chenggang Yan
Qingming Huang
38
12
0
20 Feb 2024
Cross-Attention Fusion of Visual and Geometric Features for Large
  Vocabulary Arabic Lipreading
Cross-Attention Fusion of Visual and Geometric Features for Large Vocabulary Arabic Lipreading
Samar Daou
Ahmed Rekik
A. Ben-Hamadou
Abdelaziz Kallel
31
3
0
18 Feb 2024
Efficient Training for Multilingual Visual Speech Recognition:
  Pre-training with Discretized Visual Speech Representation
Efficient Training for Multilingual Visual Speech Recognition: Pre-training with Discretized Visual Speech Representation
Minsu Kim
Jeong Hun Yeo
Se Jin Park
J. Choi
Y. Ro
25
5
0
18 Jan 2024
SFGANS Self-supervised Future Generator for human ActioN Segmentation
SFGANS Self-supervised Future Generator for human ActioN Segmentation
Or Berman
Adam Goldbraikh
S. Laufer
24
0
0
31 Dec 2023
Lip Reading for Low-resource Languages by Learning and Combining General
  Speech Knowledge and Language-specific Knowledge
Lip Reading for Low-resource Languages by Learning and Combining General Speech Knowledge and Language-specific Knowledge
Minsu Kim
Jeong Hun Yeo
J. Choi
Y. Ro
34
16
0
18 Aug 2023
AKVSR: Audio Knowledge Empowered Visual Speech Recognition by
  Compressing Audio Knowledge of a Pretrained Model
AKVSR: Audio Knowledge Empowered Visual Speech Recognition by Compressing Audio Knowledge of a Pretrained Model
Jeong Hun Yeo
Minsu Kim
J. Choi
Dae Hoe Kim
Y. Ro
24
18
0
15 Aug 2023
SparseVSR: Lightweight and Noise Robust Visual Speech Recognition
SparseVSR: Lightweight and Noise Robust Visual Speech Recognition
Adriana Fernandez-Lopez
Honglie Chen
Pingchuan Ma
A. Haliassos
Stavros Petridis
M. Pantic
VLM
27
7
0
10 Jul 2023
LipVoicer: Generating Speech from Silent Videos Guided by Lip Reading
LipVoicer: Generating Speech from Silent Videos Guided by Lip Reading
Yochai Yemini
Aviv Shamsian
Lior Bracha
Sharon Gannot
Ethan Fetaya
DiffM
11
9
0
05 Jun 2023
Audio-Visual Speech Separation in Noisy Environments with a Lightweight
  Iterative Model
Audio-Visual Speech Separation in Noisy Environments with a Lightweight Iterative Model
H. Martel
Julius Richter
Kai Li
Xiaolin Hu
Timo Gerkmann
VLM
14
9
0
31 May 2023
Zero-shot personalized lip-to-speech synthesis with face image based
  voice control
Zero-shot personalized lip-to-speech synthesis with face image based voice control
Zheng-Yan Sheng
Yang Ai
Zhenhua Ling
CVBM
24
5
0
09 May 2023
Multi-Temporal Lip-Audio Memory for Visual Speech Recognition
Multi-Temporal Lip-Audio Memory for Visual Speech Recognition
Jeong Hun Yeo
Minsu Kim
Y. Ro
27
11
0
08 May 2023
Word-level Persian Lipreading Dataset
Word-level Persian Lipreading Dataset
J. Peymanfard
Ali Lashini
Samin Heydarian
Hossein Zeinali
N. Mozayani
30
5
0
08 Apr 2023
Auto-AVSR: Audio-Visual Speech Recognition with Automatic Labels
Auto-AVSR: Audio-Visual Speech Recognition with Automatic Labels
Pingchuan Ma
A. Haliassos
Adriana Fernandez-Lopez
Honglie Chen
Stavros Petridis
M. Pantic
27
105
0
25 Mar 2023
Cross-modal Audio-visual Co-learning for Text-independent Speaker
  Verification
Cross-modal Audio-visual Co-learning for Text-independent Speaker Verification
Meng Liu
Kong Aik Lee
Longbiao Wang
Hanyi Zhang
Chang Zeng
J. Dang
23
10
0
22 Feb 2023
Prompt Tuning of Deep Neural Networks for Speaker-adaptive Visual Speech
  Recognition
Prompt Tuning of Deep Neural Networks for Speaker-adaptive Visual Speech Recognition
Minsu Kim
Hyungil Kim
Y. Ro
VLM
13
18
0
16 Feb 2023
Self-Supervised Video Forensics by Audio-Visual Anomaly Detection
Self-Supervised Video Forensics by Audio-Visual Anomaly Detection
Chao Feng
Ziyang Chen
Andrew Owens
31
71
0
04 Jan 2023
Learning to Dub Movies via Hierarchical Prosody Models
Learning to Dub Movies via Hierarchical Prosody Models
Gaoxiang Cong
Liang Li
Yuankai Qi
Zhengjun Zha
Qi Wu
Wen-yu Wang
Bin Jiang
Ming Yang
Qin Huang
62
25
0
08 Dec 2022
Training Strategies for Improved Lip-reading
Training Strategies for Improved Lip-reading
Pingchuan Ma
Yujiang Wang
Stavros Petridis
Jie Shen
M. Pantic
22
46
0
03 Sep 2022
Delving into Sequential Patches for Deepfake Detection
Delving into Sequential Patches for Deepfake Detection
Jiazhi Guan
Hang Zhou
Zhibin Hong
Errui Ding
Jingdong Wang
Chengbin Quan
Youjian Zhao
ViT
23
60
0
06 Jul 2022
Lip-Listening: Mixing Senses to Understand Lips using Cross Modality
  Knowledge Distillation for Word-Based Models
Lip-Listening: Mixing Senses to Understand Lips using Cross Modality Knowledge Distillation for Word-Based Models
Hadeel Mabrouk
Omar Abugabal
Nourhan Sakr
Hesham M. Eraqi
VLM
17
2
0
05 Jun 2022
Deep Learning for Visual Speech Analysis: A Survey
Deep Learning for Visual Speech Analysis: A Survey
Changchong Sheng
Gangyao Kuang
L. Bai
Chen Hou
Y. Guo
Xin Xu
M. Pietikäinen
Li Liu
VLM
23
33
0
22 May 2022
Audio-Visual Wake Word Spotting System For MISP Challenge 2021
Audio-Visual Wake Word Spotting System For MISP Challenge 2021
Yanguang Xu
Jianwei Sun
Yang Han
Shuaijiang Zhao
Chaoyang Mei
...
Xiangang Li
Shuran Zhou
Chuandong Xie
Wei Zou
Xiangang Li
11
7
0
19 Apr 2022
Expression-preserving face frontalization improves visually assisted
  speech processing
Expression-preserving face frontalization improves visually assisted speech processing
Zhiqi Kang
M. Sadeghi
Radu Horaud
Xavier Alameda-Pineda
CVBM
28
8
0
06 Apr 2022
Distinguishing Homophenes Using Multi-Head Visual-Audio Memory for Lip
  Reading
Distinguishing Homophenes Using Multi-Head Visual-Audio Memory for Lip Reading
Minsu Kim
Jeong Hun Yeo
Yong Man Ro
13
61
0
04 Apr 2022
Self-supervised Transformer for Deepfake Detection
Self-supervised Transformer for Deepfake Detection
Hanqing Zhao
Wenbo Zhou
Dongdong Chen
Weiming Zhang
Nenghai Yu
ViT
19
34
0
02 Mar 2022
Visual Speech Recognition for Multiple Languages in the Wild
Visual Speech Recognition for Multiple Languages in the Wild
Pingchuan Ma
Stavros Petridis
M. Pantic
VLM
114
144
0
26 Feb 2022
Learning Contextually Fused Audio-visual Representations for
  Audio-visual Speech Recognition
Learning Contextually Fused Audio-visual Representations for Audio-visual Speech Recognition
Zitian Zhang
Jie M. Zhang
Jian-Shu Zhang
Ming Wu
Xin Fang
Lirong Dai
SSL
33
10
0
15 Feb 2022
Advances and Challenges in Deep Lip Reading
Advances and Challenges in Deep Lip Reading
Marzieh Oghbaie
Arian Sabaghi
Kooshan Hashemifard
Mohammad Akbari
VLM
27
15
0
15 Oct 2021
Spatio-Temporal Attention Mechanism and Knowledge Distillation for Lip
  Reading
Spatio-Temporal Attention Mechanism and Knowledge Distillation for Lip Reading
Shahd Elashmawy
Marian M. Ramsis
Hesham M. Eraqi
Farah Eldeshnawy
Hadeel Mabrouk
Omar Abugabal
Nourhan Sakr
24
1
0
07 Aug 2021
LiRA: Learning Visual Speech Representations from Audio through
  Self-supervision
LiRA: Learning Visual Speech Representations from Audio through Self-supervision
Pingchuan Ma
Rodrigo Mira
Stavros Petridis
Björn W. Schuller
M. Pantic
SSL
18
53
0
16 Jun 2021
Exploring Deep Learning for Joint Audio-Visual Lip Biometrics
Exploring Deep Learning for Joint Audio-Visual Lip Biometrics
Meng Liu
Longbiao Wang
Kong Aik Lee
Hanyi Zhang
Chang Zeng
J. Dang
HAI
24
11
0
17 Apr 2021
End-to-end Audio-visual Speech Recognition with Conformers
End-to-end Audio-visual Speech Recognition with Conformers
Pingchuan Ma
Stavros Petridis
M. Pantic
79
224
0
12 Feb 2021
VisualVoice: Audio-Visual Speech Separation with Cross-Modal Consistency
VisualVoice: Audio-Visual Speech Separation with Cross-Modal Consistency
Ruohan Gao
Kristen Grauman
CVBM
187
198
0
08 Jan 2021
Learn an Effective Lip Reading Model without Pains
Learn an Effective Lip Reading Model without Pains
Dalu Feng
Shuang Yang
Shiguang Shan
Xilin Chen
22
61
0
15 Nov 2020
Lipreading using Temporal Convolutional Networks
Lipreading using Temporal Convolutional Networks
Brais Martínez
Pingchuan Ma
Stavros Petridis
M. Pantic
168
238
0
23 Jan 2020
Lip Reading Sentences in the Wild
Lip Reading Sentences in the Wild
Joon Son Chung
A. Senior
Oriol Vinyals
Andrew Zisserman
162
784
0
16 Nov 2016
1