ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2201.01763
  4. Cited By
Robust Self-Supervised Audio-Visual Speech Recognition

Robust Self-Supervised Audio-Visual Speech Recognition

5 January 2022
Bowen Shi
Wei-Ning Hsu
Abdel-rahman Mohamed
ArXivPDFHTML

Papers citing "Robust Self-Supervised Audio-Visual Speech Recognition"

13 / 63 papers shown
Title
VATLM: Visual-Audio-Text Pre-Training with Unified Masked Prediction for
  Speech Representation Learning
VATLM: Visual-Audio-Text Pre-Training with Unified Masked Prediction for Speech Representation Learning
Qiu-shi Zhu
Long Zhou
Zi-Hua Zhang
Shujie Liu
Binxing Jiao
Jie M. Zhang
Lirong Dai
Daxin Jiang
Jinyu Li
Furu Wei
25
37
0
21 Nov 2022
Streaming Audio-Visual Speech Recognition with Alignment Regularization
Streaming Audio-Visual Speech Recognition with Alignment Regularization
Pingchuan Ma
Niko Moritz
Stavros Petridis
Christian Fuegen
M. Pantic
29
2
0
03 Nov 2022
Learning in Audio-visual Context: A Review, Analysis, and New
  Perspective
Learning in Audio-visual Context: A Review, Analysis, and New Perspective
Yake Wei
Di Hu
Yapeng Tian
Xuelong Li
36
54
0
20 Aug 2022
Visual Speech-Aware Perceptual 3D Facial Expression Reconstruction from
  Videos
Visual Speech-Aware Perceptual 3D Facial Expression Reconstruction from Videos
P. Filntisis
George Retsinas
Foivos Paraperas-Papantoniou
Athanasios Katsamanis
A. Roussos
Petros Maragos
3DH
10
27
0
22 Jul 2022
Visually-aware Acoustic Event Detection using Heterogeneous Graphs
Visually-aware Acoustic Event Detection using Heterogeneous Graphs
A. Shirian
Krishna Somandepalli
Victor Sanchez
T. Guha
14
3
0
16 Jul 2022
u-HuBERT: Unified Mixed-Modal Speech Pretraining And Zero-Shot Transfer
  to Unlabeled Modality
u-HuBERT: Unified Mixed-Modal Speech Pretraining And Zero-Shot Transfer to Unlabeled Modality
Wei-Ning Hsu
Bowen Shi
SSL
VLM
14
40
0
14 Jul 2022
Masked Autoencoders that Listen
Masked Autoencoders that Listen
Po-Yao (Bernie) Huang
Hu Xu
Juncheng Billy Li
Alexei Baevski
Michael Auli
Wojciech Galuba
Florian Metze
Christoph Feichtenhofer
8
267
0
13 Jul 2022
MM-ALT: A Multimodal Automatic Lyric Transcription System
MM-ALT: A Multimodal Automatic Lyric Transcription System
Xiangming Gu
Longshen Ou
Danielle Ong
Ye Wang
11
13
0
13 Jul 2022
Self-Supervised Speech Representation Learning: A Review
Self-Supervised Speech Representation Learning: A Review
Abdel-rahman Mohamed
Hung-yi Lee
Lasse Borgholt
Jakob Drachmann Havtorn
Joakim Edin
...
Shang-Wen Li
Karen Livescu
Lars Maaløe
Tara N. Sainath
Shinji Watanabe
SSL
AI4TS
124
339
0
21 May 2022
Learning Lip-Based Audio-Visual Speaker Embeddings with AV-HuBERT
Learning Lip-Based Audio-Visual Speaker Embeddings with AV-HuBERT
Bowen Shi
Abdel-rahman Mohamed
Wei-Ning Hsu
SSL
18
17
0
15 May 2022
Multi-Variant Consistency based Self-supervised Learning for Robust
  Automatic Speech Recognition
Multi-Variant Consistency based Self-supervised Learning for Robust Automatic Speech Recognition
Changfeng Gao
Gaofeng Cheng
Pengyuan Zhang
25
4
0
23 Dec 2021
End-to-end Audio-visual Speech Recognition with Conformers
End-to-end Audio-visual Speech Recognition with Conformers
Pingchuan Ma
Stavros Petridis
M. Pantic
79
221
0
12 Feb 2021
VoxCeleb2: Deep Speaker Recognition
VoxCeleb2: Deep Speaker Recognition
Joon Son Chung
Arsha Nagrani
Andrew Zisserman
214
2,224
0
14 Jun 2018
Previous
12