Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2202.07428
Cited By
Learning Contextually Fused Audio-visual Representations for Audio-visual Speech Recognition
15 February 2022
Zitian Zhang
Jie M. Zhang
Jian-Shu Zhang
Ming Wu
Xin Fang
Lirong Dai
SSL
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Learning Contextually Fused Audio-visual Representations for Audio-visual Speech Recognition"
7 / 7 papers shown
Title
Audio-Visual Representation Learning via Knowledge Distillation from Speech Foundation Models
Jing-Xuan Zhang
Genshun Wan
Jianqing Gao
Zhen-Hua Ling
47
0
0
09 Feb 2025
Learning Video Temporal Dynamics with Cross-Modal Attention for Robust Audio-Visual Speech Recognition
Sungnyun Kim
Kangwook Jang
Sangmin Bae
Hoirin Kim
Se-Young Yun
29
3
0
04 Jul 2024
AV-CPL: Continuous Pseudo-Labeling for Audio-Visual Speech Recognition
Andrew Rouditchenko
R. Collobert
Tatiana Likhomanenko
VLM
16
3
0
29 Sep 2023
Jointly Learning Visual and Auditory Speech Representations from Raw Data
A. Haliassos
Pingchuan Ma
Rodrigo Mira
Stavros Petridis
M. Pantic
SSL
26
48
0
12 Dec 2022
VATLM: Visual-Audio-Text Pre-Training with Unified Masked Prediction for Speech Representation Learning
Qiu-shi Zhu
Long Zhou
Zi-Hua Zhang
Shujie Liu
Binxing Jiao
Jie M. Zhang
Lirong Dai
Daxin Jiang
Jinyu Li
Furu Wei
22
37
0
21 Nov 2022
End-to-end Audio-visual Speech Recognition with Conformers
Pingchuan Ma
Stavros Petridis
M. Pantic
79
221
0
12 Feb 2021
Lipreading using Temporal Convolutional Networks
Brais Martínez
Pingchuan Ma
Stavros Petridis
M. Pantic
165
237
0
23 Jan 2020
1