ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1810.03459
  4. Cited By
Multilingual sequence-to-sequence speech recognition: architecture,
  transfer learning, and language modeling

Multilingual sequence-to-sequence speech recognition: architecture, transfer learning, and language modeling

4 October 2018
Jaejin Cho
M. Baskar
Ruizhi Li
Sanjeev Khudanpur
Sri Harish Reddy Mallidi
Nelson Yalta
M. Karafiát
Shinji Watanabe
Takaaki Hori
ArXiv (abs)PDFHTML

Papers citing "Multilingual sequence-to-sequence speech recognition: architecture, transfer learning, and language modeling"

50 / 59 papers shown
Flavors of Moonshine: Tiny Specialized ASR Models for Edge Devices
Flavors of Moonshine: Tiny Specialized ASR Models for Edge Devices
Evan King
Adam Sabra
M. Kudlur
James Wang
Pete Warden
82
0
0
02 Sep 2025
Speech Prefix-Tuning with RNNT Loss for Improving LLM Predictions
Speech Prefix-Tuning with RNNT Loss for Improving LLM Predictions
M. Baskar
Andrew Rosenberg
Bhuvana Ramabhadran
Neeraj Gaur
Zhong Meng
181
3
0
20 Jun 2024
GigaSpeech 2: An Evolving, Large-Scale and Multi-domain ASR Corpus for Low-Resource Languages with Automated Crawling, Transcription and Refinement
GigaSpeech 2: An Evolving, Large-Scale and Multi-domain ASR Corpus for Low-Resource Languages with Automated Crawling, Transcription and Refinement
Yifan Yang
Zheshu Song
Jianheng Zhuo
Mingyu Cui
Jinpeng Li
...
Shuai Fan
Kai Yu
Wei Zhang
Guoguo Chen
Xie Chen
509
34
0
17 Jun 2024
Wav2Gloss: Generating Interlinear Glossed Text from Speech
Wav2Gloss: Generating Interlinear Glossed Text from Speech
Taiqi He
Kwanghee Choi
Lindia Tjuatja
Nathaniel R. Robinson
Jiatong Shi
Shinji Watanabe
Graham Neubig
David R. Mortensen
Lori S. Levin
VLM
210
5
0
19 Mar 2024
Automatic Speech Recognition using Advanced Deep Learning Approaches: A
  survey
Automatic Speech Recognition using Advanced Deep Learning Approaches: A survey
Hamza Kheddar
Mustapha Hemis
Yassine Himeur
OffRL
259
138
0
02 Mar 2024
A Quantitative Approach to Understand Self-Supervised Models as
  Cross-lingual Feature Extractors
A Quantitative Approach to Understand Self-Supervised Models as Cross-lingual Feature ExtractorsInternational Conference on Natural Language and Speech Processing (ICNLSP), 2023
Shuyue Stella Li
Beining Xu
Xiangyu Zhang
Hexin Liu
Wen-Han Chao
Leibny Paola García
SSL
155
4
0
27 Nov 2023
Multilingual Contextual Adapters To Improve Custom Word Recognition In
  Low-resource Languages
Multilingual Contextual Adapters To Improve Custom Word Recognition In Low-resource LanguagesInterspeech (Interspeech), 2023
Devang Kulshreshtha
Saket Dingliwal
Brady C. Houston
S. Bodapati
217
6
0
03 Jul 2023
Cross-lingual Knowledge Transfer and Iterative Pseudo-labeling for
  Low-Resource Speech Recognition with Transducers
Cross-lingual Knowledge Transfer and Iterative Pseudo-labeling for Low-Resource Speech Recognition with Transducers
J. Silovský
Liuhui Deng
Arturo Argueta
Tresi Arvizo
Roger Hsiao
Sasha Kuznietsov
Yiu-Chang Lin
Xiaoqiang Xiao
Yuanyuan Zhang
197
3
0
23 May 2023
Scaling Speech Technology to 1,000+ Languages
Scaling Speech Technology to 1,000+ LanguagesJournal of machine learning research (JMLR), 2023
Vineel Pratap
Andros Tjandra
Bowen Shi
Paden Tomasello
Arun Babu
...
Yossi Adi
Xiaohui Zhang
Wei-Ning Hsu
Alexis Conneau
Michael Auli
VLM
389
515
0
22 May 2023
DinoSR: Self-Distillation and Online Clustering for Self-supervised
  Speech Representation Learning
DinoSR: Self-Distillation and Online Clustering for Self-supervised Speech Representation LearningNeural Information Processing Systems (NeurIPS), 2023
Alexander H. Liu
Heng-Jui Chang
Michael Auli
Wei-Ning Hsu
James R. Glass
464
36
0
17 May 2023
Deep Transfer Learning for Automatic Speech Recognition: Towards Better
  Generalization
Deep Transfer Learning for Automatic Speech Recognition: Towards Better GeneralizationKnowledge-Based Systems (KBS), 2023
Hamza Kheddar
Yassine Himeur
S. Al-Maadeed
Abbes Amira
F. Bensaali
297
117
0
27 Apr 2023
Deep representation learning: Fundamentals, Perspectives, Applications,
  and Open Challenges
Deep representation learning: Fundamentals, Perspectives, Applications, and Open Challenges
K. T. Baghaei
Amirreza Payandeh
Pooya Fayyazsanavi
Shahram Rahimi
Zhiqian Chen
Somayeh Bakhtiari Ramezani
FaMLAI4TS
215
10
0
27 Nov 2022
Maestro-U: Leveraging joint speech-text representation learning for zero
  supervised speech ASR
Maestro-U: Leveraging joint speech-text representation learning for zero supervised speech ASRSpoken Language Technology Workshop (SLT), 2022
Zhehuai Chen
Ankur Bapna
Andrew Rosenberg
Yu Zhang
Bhuvana Ramabhadran
Pedro J. Moreno
Nanxin Chen
237
17
0
18 Oct 2022
Streaming End-to-End Multilingual Speech Recognition with Joint Language
  Identification
Streaming End-to-End Multilingual Speech Recognition with Joint Language IdentificationInterspeech (Interspeech), 2022
Chuxu Zhang
Yue Liu
Tara N. Sainath
Trevor Strohman
S. Mavandadi
Shuo-yiin Chang
Parisa Haghani
281
34
0
13 Sep 2022
End-to-End Spoken Language Understanding: Performance analyses of a
  voice command task in a low resource setting
End-to-End Spoken Language Understanding: Performance analyses of a voice command task in a low resource settingComputer Speech and Language (CSL), 2022
Thierry Desot
François Portet
Michel Vacher
103
15
0
17 Jul 2022
Non-Linear Pairwise Language Mappings for Low-Resource Multilingual
  Acoustic Model Fusion
Non-Linear Pairwise Language Mappings for Low-Resource Multilingual Acoustic Model FusionInterspeech (Interspeech), 2022
Muhammad Umar Farooq
Darshan Adiga Haniya Narayana
Thomas Hain
121
2
0
07 Jul 2022
Combining Spectral and Self-Supervised Features for Low Resource Speech
  Recognition and Translation
Combining Spectral and Self-Supervised Features for Low Resource Speech Recognition and TranslationInterspeech (Interspeech), 2022
Dan Berrebbi
Jiatong Shi
Brian Yan
Osbel López-Francisco
Jonathan D. Amith
Shinji Watanabe
206
30
0
05 Apr 2022
Curriculum optimization for low-resource speech recognition
Curriculum optimization for low-resource speech recognitionIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2022
Anastasia Kuznetsova
Anurag Kumar
Jennifer Drexler Fox
Francis M. Tyers
130
3
0
17 Feb 2022
Cascaded Multilingual Audio-Visual Learning from Videos
Cascaded Multilingual Audio-Visual Learning from Videos
Andrew Rouditchenko
Angie Boggust
David Harwath
Samuel Thomas
Hilde Kuehne
...
Yikang Shen
Rogerio Feris
Brian Kingsbury
M. Picheny
James R. Glass
529
8
0
08 Nov 2021
Recent Advances in End-to-End Automatic Speech Recognition
Recent Advances in End-to-End Automatic Speech RecognitionAPSIPA Transactions on Signal and Information Processing (TASIP), 2021
Jinyu Li
VLM
424
425
0
02 Nov 2021
Pseudo-Labeling for Massively Multilingual Speech Recognition
Pseudo-Labeling for Massively Multilingual Speech RecognitionIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2021
Loren Lugosch
Tatiana Likhomanenko
Gabriel Synnaeve
R. Collobert
VLM
297
34
0
30 Oct 2021
Wav-BERT: Cooperative Acoustic and Linguistic Representation Learning
  for Low-Resource Speech Recognition
Wav-BERT: Cooperative Acoustic and Linguistic Representation Learning for Low-Resource Speech Recognition
Guolin Zheng
Yubei Xiao
Ke Gong
Pan Zhou
Xiaodan Liang
Liang Lin
194
27
0
19 Sep 2021
Coarse-To-Fine And Cross-Lingual ASR Transfer
Coarse-To-Fine And Cross-Lingual ASR Transfer
Peter Polák
Ondrej Bojar
91
3
0
02 Sep 2021
A Study of Multilingual End-to-End Speech Recognition for Kazakh,
  Russian, and English
A Study of Multilingual End-to-End Speech Recognition for Kazakh, Russian, and English
Saida Mussakhojayeva
Yerbolat Khassanov
H. A. Varol
153
19
0
03 Aug 2021
Improved Language Identification Through Cross-Lingual Self-Supervised
  Learning
Improved Language Identification Through Cross-Lingual Self-Supervised LearningIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2021
Andros Tjandra
Diptanu Gon Choudhury
Frank Zhang
Kritika Singh
Alexis Conneau
Alexei Baevski
Assaf Sela
Yatharth Saraf
Michael Auli
VLMSSL
182
37
0
08 Jul 2021
Overcoming Domain Mismatch in Low Resource Sequence-to-Sequence ASR
  Models using Hybrid Generated Pseudotranscripts
Overcoming Domain Mismatch in Low Resource Sequence-to-Sequence ASR Models using Hybrid Generated Pseudotranscripts
Chak-Fai Li
Francis Keith
William Hartmann
M. Snover
O. Kimball
157
4
0
14 Jun 2021
PARP: Prune, Adjust and Re-Prune for Self-Supervised Speech Recognition
PARP: Prune, Adjust and Re-Prune for Self-Supervised Speech RecognitionNeural Information Processing Systems (NeurIPS), 2021
Cheng-I Jeff Lai
Yang Zhang
Alexander H. Liu
Shiyu Chang
Yi-Lun Liao
Yung-Sung Chuang
Kaizhi Qian
Sameer Khurana
David D. Cox
James R. Glass
VLM
295
86
0
10 Jun 2021
Towards One Model to Rule All: Multilingual Strategy for Dialectal
  Code-Switching Arabic ASR
Towards One Model to Rule All: Multilingual Strategy for Dialectal Code-Switching Arabic ASRInterspeech (Interspeech), 2021
Shammur A. Chowdhury
A. Hussein
Ahmed Abdelali
Ahmed M. Ali
265
48
0
31 May 2021
Exploiting Adapters for Cross-lingual Low-resource Speech Recognition
Exploiting Adapters for Cross-lingual Low-resource Speech RecognitionIEEE/ACM Transactions on Audio Speech and Language Processing (TASLP), 2021
Wenxin Hou
Hanlin Zhu
Yidong Wang
Yongfeng Zhang
Tao Qin
Renjun Xu
T. Shinozaki
234
73
0
18 May 2021
XLST: Cross-lingual Self-training to Learn Multilingual Representation
  for Low Resource Speech Recognition
XLST: Cross-lingual Self-training to Learn Multilingual Representation for Low Resource Speech Recognition
Zi-qiang Zhang
Yan Song
Ming Wu
Xin Fang
Lirong Dai
SSL
116
22
0
15 Mar 2021
Dynamic Acoustic Unit Augmentation With BPE-Dropout for Low-Resource
  End-to-End Speech Recognition
Dynamic Acoustic Unit Augmentation With BPE-Dropout for Low-Resource End-to-End Speech RecognitionItalian National Conference on Sensors (INS), 2021
A. Laptev
A. Andrusenko
Ivan Podluzhny
Anton Mitrofanov
Ivan Medennikov
Yuri N. Matveev
VLM
129
15
0
12 Mar 2021
End-to-end acoustic modelling for phone recognition of young readers
End-to-end acoustic modelling for phone recognition of young readersSpeech Communication (Speech Commun.), 2021
Lucile Gelin
Morgane Daniel
J. Pinquier
Thomas Pellegrini
184
17
0
04 Mar 2021
Train your classifier first: Cascade Neural Networks Training from upper
  layers to lower layers
Train your classifier first: Cascade Neural Networks Training from upper layers to lower layersIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2021
Shucong Zhang
Cong-Thanh Do
R. Doddipatla
Erfan Loweimi
P. Bell
Steve Renals
236
2
0
09 Feb 2021
Two-Stage Augmentation and Adaptive CTC Fusion for Improved Robustness
  of Multi-Stream End-to-End ASR
Two-Stage Augmentation and Adaptive CTC Fusion for Improved Robustness of Multi-Stream End-to-End ASRSpoken Language Technology Workshop (SLT), 2021
Ruizhi Li
Gregory Sell
H. Hermansky
140
2
0
05 Feb 2021
Adversarial Meta Sampling for Multilingual Low-Resource Speech
  Recognition
Adversarial Meta Sampling for Multilingual Low-Resource Speech RecognitionAAAI Conference on Artificial Intelligence (AAAI), 2020
Yubei Xiao
Ke Gong
Pan Zhou
Guolin Zheng
Xiaodan Liang
Liang Lin
201
35
0
22 Dec 2020
Transformer-Transducers for Code-Switched Speech Recognition
Transformer-Transducers for Code-Switched Speech RecognitionIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2020
Siddharth Dalmia
Yuzong Liu
S. Ronanki
Katrin Kirchhoff
222
50
0
30 Nov 2020
Bootstrap an end-to-end ASR system by multilingual training, transfer
  learning, text-to-text mapping and synthetic audio
Bootstrap an end-to-end ASR system by multilingual training, transfer learning, text-to-text mapping and synthetic audioInterspeech (Interspeech), 2020
Manuel Giollo
Deniz Gunceler
Yulan Liu
D. Willett
160
11
0
25 Nov 2020
Autosegmental Neural Nets: Should Phones and Tones be Synchronous or
  Asynchronous?
Autosegmental Neural Nets: Should Phones and Tones be Synchronous or Asynchronous?Interspeech (Interspeech), 2020
Jialu Li
M. Hasegawa-Johnson
173
5
0
28 Jul 2020
Massively Multilingual ASR: 50 Languages, 1 Model, 1 Billion Parameters
Massively Multilingual ASR: 50 Languages, 1 Model, 1 Billion Parameters
Vineel Pratap
Anuroop Sriram
Paden Tomasello
Awni Y. Hannun
Vitaliy Liptchinsky
Gabriel Synnaeve
R. Collobert
254
153
0
06 Jul 2020
Unsupervised Cross-lingual Representation Learning for Speech
  Recognition
Unsupervised Cross-lingual Representation Learning for Speech RecognitionInterspeech (Interspeech), 2020
Alexis Conneau
Alexei Baevski
R. Collobert
Abdel-rahman Mohamed
Michael Auli
SSL
362
919
0
24 Jun 2020
Improving Cross-Lingual Transfer Learning for End-to-End Speech
  Recognition with Speech Translation
Improving Cross-Lingual Transfer Learning for End-to-End Speech Recognition with Speech Translation
Changhan Wang
J. Pino
Jiatao Gu
148
33
0
09 Jun 2020
Fusion Recurrent Neural Network
Fusion Recurrent Neural Network
Yiwen Sun
Yulu Wang
Kun Fu
Zheng Wang
Changshui Zhang
Jieping Ye
92
1
0
07 Jun 2020
Improved acoustic word embeddings for zero-resource languages using
  multilingual transfer
Improved acoustic word embeddings for zero-resource languages using multilingual transferIEEE/ACM Transactions on Audio Speech and Language Processing (TASLP), 2020
Herman Kamper
Yevgen Matusevych
Sharon Goldwater
242
21
0
02 Jun 2020
An End-to-End Mispronunciation Detection System for L2 English Speech
  Leveraging Novel Anti-Phone Modeling
An End-to-End Mispronunciation Detection System for L2 English Speech Leveraging Novel Anti-Phone ModelingInterspeech (Interspeech), 2020
Bi-Cheng Yan
Meng-Che Wu
Hsiao-Tsung Hung
Berlin Chen
138
48
0
25 May 2020
DARTS-ASR: Differentiable Architecture Search for Multilingual Speech
  Recognition and Adaptation
DARTS-ASR: Differentiable Architecture Search for Multilingual Speech Recognition and Adaptation
Yi-Chen Chen
Jui-Yang Hsu
Cheng-Kuang Lee
Hung-yi Lee
197
33
0
13 May 2020
Speech Corpus of Ainu Folklore and End-to-end Speech Recognition for
  Ainu Language
Speech Corpus of Ainu Folklore and End-to-end Speech Recognition for Ainu LanguageInternational Conference on Language Resources and Evaluation (LREC), 2020
Kohei Matsuura
Sei Ueno
Masato Mimura
S. Sakai
Tatsuya Kawahara
CVBM
175
15
0
16 Feb 2020
Multilingual acoustic word embedding models for processing zero-resource
  languages
Multilingual acoustic word embedding models for processing zero-resource languagesIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2020
Herman Kamper
Yevgen Matusevych
Sharon Goldwater
272
25
0
06 Feb 2020
Meta Learning for End-to-End Low-Resource Speech Recognition
Meta Learning for End-to-End Low-Resource Speech RecognitionIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2019
Jui-Yang Hsu
Yuan-Jui Chen
Hung-yi Lee
116
114
0
26 Oct 2019
Analyzing ASR pretraining for low-resource speech-to-text translation
Analyzing ASR pretraining for low-resource speech-to-text translationIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2019
Mihaela C. Stoian
Sameer Bansal
Sharon Goldwater
227
71
0
23 Oct 2019
A practical two-stage training strategy for multi-stream end-to-end
  speech recognition
A practical two-stage training strategy for multi-stream end-to-end speech recognitionIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2019
Ruizhi Li
Gregory Sell
Xiaofei Wang
Shinji Watanabe
H. Hermansky
99
7
0
23 Oct 2019
12
Next