ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1809.02108
  4. Cited By
Deep Audio-Visual Speech Recognition

Deep Audio-Visual Speech Recognition

6 September 2018
Triantafyllos Afouras
Joon Son Chung
A. Senior
Oriol Vinyals
Andrew Zisserman
ArXivPDFHTML

Papers citing "Deep Audio-Visual Speech Recognition"

50 / 348 papers shown
Title
SwinLip: An Efficient Visual Speech Encoder for Lip Reading Using Swin Transformer
SwinLip: An Efficient Visual Speech Encoder for Lip Reading Using Swin Transformer
Young-Hu Park
R.-H. Park
Hyung-Min Park
49
0
0
07 May 2025
CoGenAV: Versatile Audio-Visual Representation Learning via Contrastive-Generative Synchronization
CoGenAV: Versatile Audio-Visual Representation Learning via Contrastive-Generative Synchronization
Detao Bai
Zhiheng Ma
Xihan Wei
Liefeng Bo
115
0
0
06 May 2025
AlignDiT: Multimodal Aligned Diffusion Transformer for Synchronized Speech Generation
AlignDiT: Multimodal Aligned Diffusion Transformer for Synchronized Speech Generation
J. Choi
Ji-Hoon Kim
Kim Sung-Bin
Tae-Hyun Oh
Joon Son Chung
DiffM
49
0
0
29 Apr 2025
Chinese-LiPS: A Chinese audio-visual speech recognition dataset with Lip-reading and Presentation Slides
Chinese-LiPS: A Chinese audio-visual speech recognition dataset with Lip-reading and Presentation Slides
Jinghua Zhao
Yuhang Jia
Shiyao Wang
Jiaming Zhou
Hui Wang
Yong Qin
37
0
0
21 Apr 2025
DeepMLF: Multimodal language model with learnable tokens for deep fusion in sentiment analysis
DeepMLF: Multimodal language model with learnable tokens for deep fusion in sentiment analysis
Efthymios Georgiou
V. Katsouros
Yannis Avrithis
Alexandros Potamianos
24
1
0
15 Apr 2025
FluentLip: A Phonemes-Based Two-stage Approach for Audio-Driven Lip Synthesis with Optical Flow Consistency
FluentLip: A Phonemes-Based Two-stage Approach for Audio-Driven Lip Synthesis with Optical Flow Consistency
Shiyan Liu
Rui Qu
Yan Jin
31
0
0
06 Apr 2025
Causal Self-supervised Pretrained Frontend with Predictive Code for Speech Separation
Causal Self-supervised Pretrained Frontend with Predictive Code for Speech Separation
Wupeng Wang
Zexu Pan
X. Li
Shuai Wang
Haizhou Li
AI4TS
36
0
0
03 Apr 2025
Detecting Lip-Syncing Deepfakes: Vision Temporal Transformer for Analyzing Mouth Inconsistencies
Detecting Lip-Syncing Deepfakes: Vision Temporal Transformer for Analyzing Mouth Inconsistencies
Soumyya Kanti Datta
Shan Jia
Siwei Lyu
44
0
0
02 Apr 2025
Visual Acoustic Fields
Visual Acoustic Fields
Yuelei Li
Hyunjin Kim
Fangneng Zhan
Ri-Zhao Qiu
Mazeyu Ji
Xiaojun Shan
Xueyan Zou
Paul Liang
Hanspeter Pfister
Xiaolong Wang
47
0
0
31 Mar 2025
STSA: Spatial-Temporal Semantic Alignment for Visual Dubbing
STSA: Spatial-Temporal Semantic Alignment for Visual Dubbing
Zijun Ding
Mingdie Xiong
Congcong Zhu
Jingrun Chen
DiffM
61
0
0
29 Mar 2025
VALLR: Visual ASR Language Model for Lip Reading
VALLR: Visual ASR Language Model for Lip Reading
Marshall Thomas
Edward Fish
Richard Bowden
38
0
0
27 Mar 2025
UniSync: A Unified Framework for Audio-Visual Synchronization
UniSync: A Unified Framework for Audio-Visual Synchronization
Tao Feng
Yifan Xie
Xun Guan
Jiyuan Song
Z. Liu
Fei Ma
Fei Richard Yu
73
1
0
20 Mar 2025
MMS-LLaMA: Efficient LLM-based Audio-Visual Speech Recognition with Minimal Multimodal Speech Tokens
Jeong Hun Yeo
Hyeongseop Rha
Se Jin Park
Y. Ro
51
0
0
14 Mar 2025
Adaptive Audio-Visual Speech Recognition via Matryoshka-Based Multimodal LLMs
Umberto Cappellazzo
Minsu Kim
Stavros Petridis
54
0
0
09 Mar 2025
Audio-Visual Representation Learning via Knowledge Distillation from Speech Foundation Models
Jing-Xuan Zhang
Genshun Wan
Jianqing Gao
Zhen-Hua Ling
47
0
0
09 Feb 2025
mWhisper-Flamingo for Multilingual Audio-Visual Noise-Robust Speech Recognition
mWhisper-Flamingo for Multilingual Audio-Visual Noise-Robust Speech Recognition
Andrew Rouditchenko
Saurabhchand Bhati
Samuel Thomas
Hilde Kuehne
Rogerio Feris
113
1
0
03 Feb 2025
EDSep: An Effective Diffusion-Based Method for Speech Source Separation
Jinwei Dong
Xinsheng Wang
Qirong Mao
63
0
0
28 Jan 2025
Multi-Task Corrupted Prediction for Learning Robust Audio-Visual Speech Representation
Multi-Task Corrupted Prediction for Learning Robust Audio-Visual Speech Representation
Sungnyun Kim
Sungwoo Cho
Sangmin Bae
Kangwook Jang
Se-Young Yun
SSL
68
1
0
23 Jan 2025
Uncovering the Visual Contribution in Audio-Visual Speech Recognition
Uncovering the Visual Contribution in Audio-Visual Speech Recognition
Zhaofeng Lin
Naomi Harte
78
1
0
20 Jan 2025
Personalized Lip Reading: Adapting to Your Unique Lip Movements with Vision and Language
Personalized Lip Reading: Adapting to Your Unique Lip Movements with Vision and Language
Jeong Hun Yeo
Chae Won Kim
Hyunjun Kim
Hyeongseop Rha
Seunghee Han
Wen-Huang Cheng
Y. Ro
52
3
0
03 Jan 2025
Improving Lip-synchrony in Direct Audio-Visual Speech-to-Speech
  Translation
Improving Lip-synchrony in Direct Audio-Visual Speech-to-Speech Translation
Lucas Goncalves
Prashant Mathur
Xing Niu
Brady Houston
Chandrashekhar Lavania
Srikanth Vishnubhotla
Lijia Sun
Anthony Ferritto
67
0
0
21 Dec 2024
Bag of Tricks for Multimodal AutoML with Image, Text, and Tabular Data
Bag of Tricks for Multimodal AutoML with Image, Text, and Tabular Data
Zhiqiang Tang
Zihan Zhong
Tong He
Gerald Friedland
83
0
0
19 Dec 2024
The Sound of Water: Inferring Physical Properties from Pouring Liquids
Piyush Bagad
Makarand Tapaswi
Cees G. M. Snoek
Andrew Zisserman
45
0
0
18 Nov 2024
Towards Open-Vocabulary Audio-Visual Event Localization
Jinxing Zhou
D. Guo
Ruohao Guo
Yuxin Mao
Jingjing Hu
Yiran Zhong
Xiaojun Chang
M. Wang
VLM
58
4
0
18 Nov 2024
DiMoDif: Discourse Modality-information Differentiation for Audio-visual Deepfake Detection and Localization
DiMoDif: Discourse Modality-information Differentiation for Audio-visual Deepfake Detection and Localization
C. Koutlis
Symeon Papadopoulos
58
2
0
15 Nov 2024
Speech Separation with Pretrained Frontend to Minimize Domain Mismatch
Speech Separation with Pretrained Frontend to Minimize Domain Mismatch
Wupeng Wang
Zexu Pan
X. Li
Shuai Wang
H. Li
34
4
0
05 Nov 2024
Titanic Calling: Low Bandwidth Video Conference from the Titanic Wreck
Titanic Calling: Low Bandwidth Video Conference from the Titanic Wreck
Fevziye Irem Eyiokur
Christian Huber
Thai-Binh Nguyen
T. Nguyen
Fabian Retkowski
Enes Yavuz Ugan
Dogucan Yaman
Alexander Waibel
29
0
0
15 Oct 2024
Character-aware audio-visual subtitling in context
Character-aware audio-visual subtitling in context
Jaesung Huh
Andrew Zisserman
36
0
0
14 Oct 2024
Bridging Today and the Future of Humanity: AI Safety in 2024 and Beyond
Bridging Today and the Future of Humanity: AI Safety in 2024 and Beyond
Shanshan Han
84
1
0
09 Oct 2024
STNet: Deep Audio-Visual Fusion Network for Robust Speaker Tracking
STNet: Deep Audio-Visual Fusion Network for Robust Speaker Tracking
Yidi Li
Hong Liu
Bing Yang
32
4
0
08 Oct 2024
TIGER: Time-frequency Interleaved Gain Extraction and Reconstruction for Efficient Speech Separation
TIGER: Time-frequency Interleaved Gain Extraction and Reconstruction for Efficient Speech Separation
Mohan Xu
Kai Li
Guo Chen
Xiaolin Hu
43
0
0
02 Oct 2024
Quantitative Analysis of Audio-Visual Tasks: An Information-Theoretic
  Perspective
Quantitative Analysis of Audio-Visual Tasks: An Information-Theoretic Perspective
Chen Chen
Xiaolou Li
Zehua Liu
Lantian Li
D. Wang
31
0
0
29 Sep 2024
Video-to-Audio Generation with Fine-grained Temporal Semantics
Video-to-Audio Generation with Fine-grained Temporal Semantics
Yuchen Hu
Yu Gu
Chenxing Li
Rilin Chen
Dong Yu
VGen
DiffM
24
1
0
23 Sep 2024
Robust Audio-Visual Speech Enhancement: Correcting Misassignments in
  Complex Environments with Advanced Post-Processing
Robust Audio-Visual Speech Enhancement: Correcting Misassignments in Complex Environments with Advanced Post-Processing
Wenze Ren
Kuo-Hsuan Hung
Rong-Yu Chao
YouJin Li
Hsin-Min Wang
Yu Tsao
26
0
0
22 Sep 2024
Self-Supervised Audio-Visual Soundscape Stylization
Self-Supervised Audio-Visual Soundscape Stylization
Tingle Li
Renhao Wang
Po-Yao Huang
Andrew Owens
Gopala Anumanchipalli
DiffM
SSL
38
4
0
22 Sep 2024
Robust Audiovisual Speech Recognition Models with Mixture-of-Experts
Robust Audiovisual Speech Recognition Models with Mixture-of-Experts
Yihan Wu
Yifan Peng
Yichen Lu
Xuankai Chang
Ruihua Song
Shinji Watanabe
44
2
0
19 Sep 2024
Large Language Models are Strong Audio-Visual Speech Recognition Learners
Large Language Models are Strong Audio-Visual Speech Recognition Learners
Umberto Cappellazzo
Minsu Kim
Honglie Chen
Pingchuan Ma
Stavros Petridis
Daniele Falavigna
Alessio Brutti
Maja Pantic
36
9
0
18 Sep 2024
Multi-modal Speech Transformer Decoders: When Do Multiple Modalities Improve Accuracy?
Multi-modal Speech Transformer Decoders: When Do Multiple Modalities Improve Accuracy?
Yiwen Guan
V. Trinh
Vivek Voleti
Jacob Whitehill
34
1
0
13 Sep 2024
Interpretable Convolutional SyncNet
Interpretable Convolutional SyncNet
Sungjoon Park
Jaesub Yun
Donggeon Lee
Minsik Park
52
0
0
02 Sep 2024
DCIM-AVSR : Efficient Audio-Visual Speech Recognition via Dual Conformer Interaction Module
DCIM-AVSR : Efficient Audio-Visual Speech Recognition via Dual Conformer Interaction Module
Xinyu Wang
Qian Wang
Haolin Huang
Yu Fang
Mengjie Xu
Qian Wang
26
0
0
31 Aug 2024
Human-Inspired Audio-Visual Speech Recognition: Spike Activity, Cueing
  Interaction and Causal Processing
Human-Inspired Audio-Visual Speech Recognition: Spike Activity, Cueing Interaction and Causal Processing
Qianhui Liu
J. Wang
Yang Wang
Xin Yang
Gang Pan
Haizhou Li
27
0
0
29 Aug 2024
S^3D-NeRF: Single-Shot Speech-Driven Neural Radiance Field for High
  Fidelity Talking Head Synthesis
S^3D-NeRF: Single-Shot Speech-Driven Neural Radiance Field for High Fidelity Talking Head Synthesis
Dongze Li
Kang Zhao
Wei Wang
Yifeng Ma
Bo Peng
Yingya Zhang
Jing Dong
3DH
CVBM
35
2
0
18 Aug 2024
Spiking Tucker Fusion Transformer for Audio-Visual Zero-Shot Learning
Spiking Tucker Fusion Transformer for Audio-Visual Zero-Shot Learning
Wenrui Li
Penghong Wang
Ruiqin Xiong
Xiaopeng Fan
32
8
0
11 Jul 2024
Tailored Design of Audio-Visual Speech Recognition Models using Branchformers
Tailored Design of Audio-Visual Speech Recognition Models using Branchformers
David Gimeno-Gómez
Carlos David Martínez Hinarejos
88
2
0
09 Jul 2024
Learning Video Temporal Dynamics with Cross-Modal Attention for Robust
  Audio-Visual Speech Recognition
Learning Video Temporal Dynamics with Cross-Modal Attention for Robust Audio-Visual Speech Recognition
Sungnyun Kim
Kangwook Jang
Sangmin Bae
Hoirin Kim
Se-Young Yun
42
3
0
04 Jul 2024
MSRS: Training Multimodal Speech Recognition Models from Scratch with
  Sparse Mask Optimization
MSRS: Training Multimodal Speech Recognition Models from Scratch with Sparse Mask Optimization
Adriana Fernandez-Lopez
Honglie Chen
Pingchuan Ma
Lu Yin
Q. Xiao
Stavros Petridis
Shiwei Liu
Maja Pantic
46
2
0
25 Jun 2024
SyncVSR: Data-Efficient Visual Speech Recognition with End-to-End
  Crossmodal Audio Token Synchronization
SyncVSR: Data-Efficient Visual Speech Recognition with End-to-End Crossmodal Audio Token Synchronization
Young Jin Ahn
Jungwoo Park
Sangha Park
Jonghyun Choi
Kee-Eung Kim
34
7
0
18 Jun 2024
AV-CrossNet: an Audiovisual Complex Spectral Mapping Network for Speech
  Separation By Leveraging Narrow- and Cross-Band Modeling
AV-CrossNet: an Audiovisual Complex Spectral Mapping Network for Speech Separation By Leveraging Narrow- and Cross-Band Modeling
Vahid Ahmadi Kalkhorani
Cheng Yu
Anurag Kumar
Ke Tan
Buye Xu
DeLiang Wang
32
0
0
17 Jun 2024
A Comprehensive Taxonomy and Analysis of Talking Head Synthesis:
  Techniques for Portrait Generation, Driving Mechanisms, and Editing
A Comprehensive Taxonomy and Analysis of Talking Head Synthesis: Techniques for Portrait Generation, Driving Mechanisms, and Editing
Ming Meng
Yufei Zhao
Bo Zhang
Yonggui Zhu
Weimin Shi
Maxwell Wen
Zhaoxin Fan
VGen
39
1
0
15 Jun 2024
CNVSRC 2023: The First Chinese Continuous Visual Speech Recognition
  Challenge
CNVSRC 2023: The First Chinese Continuous Visual Speech Recognition Challenge
Chen Chen
Zehua Liu
Xiaolou Li
Lantian Li
D. Wang
35
2
0
14 Jun 2024
1234567
Next