ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2403.19822
  4. Cited By
Multi-Stage Multi-Modal Pre-Training for Automatic Speech Recognition

Multi-Stage Multi-Modal Pre-Training for Automatic Speech Recognition

28 March 2024
Yash Jain
David M. Chan
Pranav Dheram
Aparna Khare
Olabanji Shonibare
Venkatesh Ravichandran
Shalini Ghosh
ArXivPDFHTML

Papers citing "Multi-Stage Multi-Modal Pre-Training for Automatic Speech Recognition"

12 / 12 papers shown
Title
An Efficient Self-Learning Framework For Interactive Spoken Dialog
  Systems
An Efficient Self-Learning Framework For Interactive Spoken Dialog Systems
Hitesh Tulsiani
David M. Chan
Shalini Ghosh
Garima Lalwani
Prabhat Pandey
Ankish Bansal
Sri Garimella
Ariya Rastrow
Björn Hoffmeister
16
0
0
16 Sep 2024
Data Diversity Matters for Robust Instruction Tuning
Data Diversity Matters for Robust Instruction Tuning
Alexander Bukharin
Tuo Zhao
65
35
0
21 Nov 2023
Learning to Discern: Imitating Heterogeneous Human Demonstrations with
  Preference and Representation Learning
Learning to Discern: Imitating Heterogeneous Human Demonstrations with Preference and Representation Learning
Sachit Kuhar
Shuo Cheng
Shivang Chopra
Matthew Bronars
Danfei Xu
24
8
0
22 Oct 2023
Google USM: Scaling Automatic Speech Recognition Beyond 100 Languages
Google USM: Scaling Automatic Speech Recognition Beyond 100 Languages
Yu Zhang
Wei Han
James Qin
Yongqiang Wang
Ankur Bapna
...
Pedro J. Moreno
Chung-Cheng Chiu
J. Schalkwyk
Franccoise Beaufays
Yonghui Wu
VLM
77
249
0
02 Mar 2023
Unified Modeling of Multi-Domain Multi-Device ASR Systems
Soumyajit Mitra
Swayambhu Nath Ray
Bharat Padi
Arunasish Sen
Raghavendra Bilgi
Harish Arsikere
Shalini Ghosh
A. Srinivasamurthy
Sri Garimella
13
3
0
13 May 2022
Multilingual Speech Recognition using Knowledge Transfer across Learning
  Processes
Multilingual Speech Recognition using Knowledge Transfer across Learning Processes
Rimita Lahiri
K. Kumatani
Eric Sun
Yao Qian
36
6
0
15 Oct 2021
VideoCLIP: Contrastive Pre-training for Zero-shot Video-Text
  Understanding
VideoCLIP: Contrastive Pre-training for Zero-shot Video-Text Understanding
Hu Xu
Gargi Ghosh
Po-Yao (Bernie) Huang
Dmytro Okhonko
Armen Aghajanyan
Florian Metze
Luke Zettlemoyer
Florian Metze Luke Zettlemoyer Christoph Feichtenhofer
CLIP
VLM
245
554
0
28 Sep 2021
What Matters in Learning from Offline Human Demonstrations for Robot
  Manipulation
What Matters in Learning from Offline Human Demonstrations for Robot Manipulation
Ajay Mandlekar
Danfei Xu
J. Wong
Soroush Nasiriany
Chen Wang
Rohun Kulkarni
Li Fei-Fei
Silvio Savarese
Yuke Zhu
Roberto Martín-Martín
OffRL
139
461
0
06 Aug 2021
Multimodal Self-Supervised Learning of General Audio Representations
Multimodal Self-Supervised Learning of General Audio Representations
Luyu Wang
Pauline Luc
Adrià Recasens
Jean-Baptiste Alayrac
Aaron van den Oord
SSL
70
38
0
26 Apr 2021
VATT: Transformers for Multimodal Self-Supervised Learning from Raw
  Video, Audio and Text
VATT: Transformers for Multimodal Self-Supervised Learning from Raw Video, Audio and Text
Hassan Akbari
Liangzhe Yuan
Rui Qian
Wei-Hong Chuang
Shih-Fu Chang
Yin Cui
Boqing Gong
ViT
231
573
0
22 Apr 2021
Is Space-Time Attention All You Need for Video Understanding?
Is Space-Time Attention All You Need for Video Understanding?
Gedas Bertasius
Heng Wang
Lorenzo Torresani
ViT
278
1,939
0
09 Feb 2021
VoxCeleb2: Deep Speaker Recognition
VoxCeleb2: Deep Speaker Recognition
Joon Son Chung
Arsha Nagrani
Andrew Zisserman
214
1,954
0
14 Jun 2018
1