Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2110.09890
Cited By
Multi-Modal Pre-Training for Automated Speech Recognition
12 October 2021
David M. Chan
Shalini Ghosh
D. Chakrabarty
Björn Hoffmeister
SSL
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Multi-Modal Pre-Training for Automated Speech Recognition"
14 / 14 papers shown
Title
An Efficient Self-Learning Framework For Interactive Spoken Dialog Systems
Hitesh Tulsiani
David M. Chan
Shalini Ghosh
Garima Lalwani
Prabhat Pandey
Ankish Bansal
Sri Garimella
Ariya Rastrow
Björn Hoffmeister
18
0
0
16 Sep 2024
Multi-Stage Multi-Modal Pre-Training for Automatic Speech Recognition
Yash Jain
David M. Chan
Pranav Dheram
Aparna Khare
Olabanji Shonibare
Venkatesh Ravichandran
Shalini Ghosh
27
2
0
28 Mar 2024
Task Oriented Dialogue as a Catalyst for Self-Supervised Automatic Speech Recognition
David M. Chan
Shalini Ghosh
Hitesh Tulsiani
Ariya Rastrow
Björn Hoffmeister
15
1
0
04 Jan 2024
Hearing Lips in Noise: Universal Viseme-Phoneme Mapping and Transfer for Robust Audio-Visual Speech Recognition
Yuchen Hu
Ruizhe Li
Cheng Chen
Chengwei Qin
Qiu-shi Zhu
E. Chng
16
5
0
18 Jun 2023
AV-data2vec: Self-supervised Learning of Audio-Visual Speech Representations with Contextualized Target Representations
Jiachen Lian
Alexei Baevski
Wei-Ning Hsu
Michael Auli
SSL
22
32
0
10 Feb 2023
VATLM: Visual-Audio-Text Pre-Training with Unified Masked Prediction for Speech Representation Learning
Qiu-shi Zhu
Long Zhou
Zi-Hua Zhang
Shujie Liu
Binxing Jiao
Jie M. Zhang
Lirong Dai
Daxin Jiang
Jinyu Li
Furu Wei
22
37
0
21 Nov 2022
Audio-Visual Speech Enhancement and Separation by Utilizing Multi-Modal Self-Supervised Embeddings
Ethan Chern
Kuo-Hsuan Hung
Yi-Ting Chen
Tassadaq Hussain
M. Gogate
Amir Hussain
Yu Tsao
Jen-Cheng Hou
SSL
8
15
0
31 Oct 2022
Self-Supervised Speech Representation Learning: A Review
Abdel-rahman Mohamed
Hung-yi Lee
Lasse Borgholt
Jakob Drachmann Havtorn
Joakim Edin
...
Shang-Wen Li
Karen Livescu
Lars Maaløe
Tara N. Sainath
Shinji Watanabe
SSL
AI4TS
124
339
0
21 May 2022
Content-Context Factorized Representations for Automated Speech Recognition
David M. Chan
Shalini Ghosh
8
11
0
19 May 2022
Multimodal Conversational AI: A Survey of Datasets and Approaches
Anirudh S. Sundar
Larry Heck
30
29
0
13 May 2022
Learning Audio-Visual Speech Representation by Masked Multimodal Cluster Prediction
Bowen Shi
Wei-Ning Hsu
Kushal Lakhotia
Abdel-rahman Mohamed
SSL
29
299
0
05 Jan 2022
Multimodal Self-Supervised Learning of General Audio Representations
Luyu Wang
Pauline Luc
Adrià Recasens
Jean-Baptiste Alayrac
Aaron van den Oord
SSL
70
38
0
26 Apr 2021
VATT: Transformers for Multimodal Self-Supervised Learning from Raw Video, Audio and Text
Hassan Akbari
Liangzhe Yuan
Rui Qian
Wei-Hong Chuang
Shih-Fu Chang
Yin Cui
Boqing Gong
ViT
231
573
0
22 Apr 2021
Is Space-Time Attention All You Need for Video Understanding?
Gedas Bertasius
Heng Wang
Lorenzo Torresani
ViT
278
1,939
0
09 Feb 2021
1