ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2002.01320
  4. Cited By
CoVoST: A Diverse Multilingual Speech-To-Text Translation Corpus
v1v2 (latest)

CoVoST: A Diverse Multilingual Speech-To-Text Translation Corpus

International Conference on Language Resources and Evaluation (LREC), 2020
4 February 2020
Changhan Wang
J. Pino
Anne Wu
Jiatao Gu
    SLR
ArXiv (abs)PDFHTML

Papers citing "CoVoST: A Diverse Multilingual Speech-To-Text Translation Corpus"

50 / 63 papers shown
Ming-UniAudio: Speech LLM for Joint Understanding, Generation and Editing with Unified Representation
Ming-UniAudio: Speech LLM for Joint Understanding, Generation and Editing with Unified Representation
C. Yan
Chunxiang Jin
Dawei Huang
Haibing Yu
Han Peng
...
Yongjie Lyu
Z. He
Zhihao Qiu
Zhiqiang Fang
Ziyuan Huang
AuLLM
446
9
0
26 Oct 2025
InteractiveOmni: A Unified Omni-modal Model for Audio-Visual Multi-turn Dialogue
InteractiveOmni: A Unified Omni-modal Model for Audio-Visual Multi-turn Dialogue
Wenwen Tong
Hewei Guo
Dongchuan Ran
Jiangnan Chen
Jiefan Lu
...
Dinghao Zhou
Guiping Zhong
Ken Zheng
Shiyin Kang
Lewei Lu
MLLMAuLLMVGenVLM
474
8
0
15 Oct 2025
CS3-Bench: Evaluating and Enhancing Speech-to-Speech LLMs for Mandarin-English Code-Switching
CS3-Bench: Evaluating and Enhancing Speech-to-Speech LLMs for Mandarin-English Code-Switching
Heyang Liu
Yuhao Wang
Ziyang Cheng
Ronghua Wu
Qunshan Gu
Y. Wang
Yu Wang
127
2
0
09 Oct 2025
SimulSense: Sense-Driven Interpreting for Efficient Simultaneous Speech Translation
SimulSense: Sense-Driven Interpreting for Efficient Simultaneous Speech Translation
Haotian Tan
Hiroki Ouchi
S. Sakti
161
0
0
26 Sep 2025
PART: Progressive Alignment Representation Training for Multilingual Speech-To-Text with LLMs
PART: Progressive Alignment Representation Training for Multilingual Speech-To-Text with LLMs
Pei Zhang
Andong Chen
Xi Chen
Baosong Yang
Yang Li
Fei Huang
136
0
0
24 Sep 2025
Whisper-UT: A Unified Translation Framework for Speech and Text
Whisper-UT: A Unified Translation Framework for Speech and Text
Cihan Xiao
Matthew Wiesner
Debashish Chakraborty
Reno Kriz
Keith Cunningham
Kenton W. Murray
Kevin Duh
Luis Tavarez-Arce
Paul McNamee
Sanjeev Khudanpur
117
0
0
19 Sep 2025
Self-Improvement for Audio Large Language Model using Unlabeled Speech
Self-Improvement for Audio Large Language Model using Unlabeled Speech
S. Wang
Xinyuan Chen
Yao Xu
AuLLM
211
8
0
27 Jul 2025
SeqPO-SiMT: Sequential Policy Optimization for Simultaneous Machine Translation
SeqPO-SiMT: Sequential Policy Optimization for Simultaneous Machine TranslationAnnual Meeting of the Association for Computational Linguistics (ACL), 2025
Ting Xu
Zhichao Huang
Jiankai Sun
Shanbo Cheng
Wai Lam
OffRL
408
10
0
27 May 2025
Reshaping Representation Space to Balance the Safety and Over-rejection in Large Audio Language Models
Reshaping Representation Space to Balance the Safety and Over-rejection in Large Audio Language Models
Hao Yang
Zhuang Li
Ehsan Shareghi
Gholamreza Haffari
234
2
0
26 May 2025
Miipher-2: A Universal Speech Restoration Model for Million-Hour Scale Data Restoration
Miipher-2: A Universal Speech Restoration Model for Million-Hour Scale Data Restoration
Shigeki Karita
Yuma Koizumi
Heiga Zen
Haruko Ishikawa
Robin Scheibler
M. Bacchiani
VLM
1.1K
5
0
07 May 2025
Mind the Gap! Static and Interactive Evaluations of Large Audio Models
Mind the Gap! Static and Interactive Evaluations of Large Audio Models
Minzhi Li
William B. Held
Michael Joseph Ryan
Kunat Pipatanakul
Potsawee Manakul
Hao Zhu
Diyi Yang
AuLLMALM
245
4
0
21 Feb 2025
Cross-lingual Embedding Clustering for Hierarchical Softmax in Low-Resource Multilingual Speech Recognition
Cross-lingual Embedding Clustering for Hierarchical Softmax in Low-Resource Multilingual Speech RecognitionIEEE Transactions on Audio, Speech, and Language Processing (TASLP), 2025
Zhengdong Yang
Qianying Liu
Sheng Li
Fei Cheng
Chenhui Chu
BDL
188
0
0
29 Jan 2025
SPES: Spectrogram Perturbation for Explainable Speech-to-Text Generation
SPES: Spectrogram Perturbation for Explainable Speech-to-Text Generation
Dennis Fucci
Marco Gaido
Beatrice Savoldi
Matteo Negri
Mauro Cettolo
L. Bentivogli
631
5
0
03 Nov 2024
Towards Achieving Human Parity on End-to-end Simultaneous Speech
  Translation via LLM Agent
Towards Achieving Human Parity on End-to-end Simultaneous Speech Translation via LLM Agent
Shanbo Cheng
Zhichao Huang
Tom Ko
Hang Li
Ningxin Peng
Lu Xu
Qini Zhang
366
13
0
31 Jul 2024
NAIST Simultaneous Speech Translation System for IWSLT 2024
NAIST Simultaneous Speech Translation System for IWSLT 2024
Yuka Ko
Ryo Fukuda
Yuta Nishikawa
Yasumasa Kano
Tomoya Yanagita
...
Haotian Tan
Makoto Sakai
S. Sakti
Katsuhito Sudoh
Satoshi Nakamura
355
4
0
30 Jun 2024
LeaPformer: Enabling Linear Transformers for Autoregressive and
  Simultaneous Tasks via Learned Proportions
LeaPformer: Enabling Linear Transformers for Autoregressive and Simultaneous Tasks via Learned ProportionsInternational Conference on Machine Learning (ICML), 2024
Victor Agostinelli
Sanghyun Hong
Lizhong Chen
KELM
317
3
0
18 May 2024
FFSTC: Fongbe to French Speech Translation Corpus
FFSTC: Fongbe to French Speech Translation CorpusInternational Conference on Language Resources and Evaluation (LREC), 2024
D. F. Kponou
F. Laleye
E. C. Ezin
230
3
0
08 Mar 2024
AIR-Bench: Benchmarking Large Audio-Language Models via Generative
  Comprehension
AIR-Bench: Benchmarking Large Audio-Language Models via Generative Comprehension
Qian Yang
Jin Xu
Wenrui Liu
Yunfei Chu
Ziyue Jiang
...
Yichong Leng
Yuanjun Lv
Zhou Zhao
Chang Zhou
Jingren Zhou
LM&MAAuLLMALM
271
196
0
12 Feb 2024
An Experimental Study: Assessing the Combined Framework of WavLM and
  BEST-RQ for Text-to-Speech Synthesis
An Experimental Study: Assessing the Combined Framework of WavLM and BEST-RQ for Text-to-Speech Synthesis
Via Nielson
Steven Hillis
118
0
0
08 Dec 2023
End-to-End Speech-to-Text Translation: A Survey
End-to-End Speech-to-Text Translation: A Survey
Nivedita Sethiya
Chandresh Kumar Maurya
557
13
0
02 Dec 2023
End-to-End Single-Channel Speaker-Turn Aware Conversational Speech
  Translation
End-to-End Single-Channel Speaker-Turn Aware Conversational Speech TranslationConference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Juan Pablo Zuluaga
Zhaocheng Huang
Xing Niu
Rohit Paturi
S. Srinivasan
Prashant Mathur
Brian Thompson
Marcello Federico
BDL
267
3
0
01 Nov 2023
Sparks of Large Audio Models: A Survey and Outlook
Sparks of Large Audio Models: A Survey and Outlook
S. Latif
Moazzam Shoukat
Fahad Shamshad
Muhammad Usama
Yi Ren
...
Wenwu Wang
Xulong Zhang
Roberto Togneri
Xiaoshi Zhong
Björn W. Schuller
LM&MAAuLLM
807
55
0
24 Aug 2023
Recent Advances in Direct Speech-to-text Translation
Recent Advances in Direct Speech-to-text TranslationInternational Joint Conference on Artificial Intelligence (IJCAI), 2023
Chen Xu
Rong Ye
Qianqian Dong
Chengqi Zhao
Tom Ko
Mingxuan Wang
Tong Xiao
Jingbo Zhu
372
34
0
20 Jun 2023
Tagged End-to-End Simultaneous Speech Translation Training using
  Simultaneous Interpretation Data
Tagged End-to-End Simultaneous Speech Translation Training using Simultaneous Interpretation DataInternational Workshop on Spoken Language Translation (IWSLT), 2023
Yuka Ko
Ryo Fukuda
Yuta Nishikawa
Yasumasa Kano
Katsuhito Sudoh
Satoshi Nakamura
217
7
0
14 Jun 2023
Improved Cross-Lingual Transfer Learning For Automatic Speech
  Translation
Improved Cross-Lingual Transfer Learning For Automatic Speech Translation
Sameer Khurana
Nauman Dawalatabad
Antoine Laurent
Luis Vicente
Pablo Gimeno
Victoria Mingote
James R. Glass
VLM
402
2
0
01 Jun 2023
BIG-C: a Multimodal Multi-Purpose Dataset for Bemba
BIG-C: a Multimodal Multi-Purpose Dataset for BembaAnnual Meeting of the Association for Computational Linguistics (ACL), 2023
Claytone Sikasote
Eunice Mukonde
Md Mahfuz Ibn Alam
Antonios Anastasopoulos
229
9
0
26 May 2023
Inter-connection: Effective Connection between Pre-trained Encoder and
  Decoder for Speech Translation
Inter-connection: Effective Connection between Pre-trained Encoder and Decoder for Speech TranslationInterspeech (Interspeech), 2023
Yuta Nishikawa
Satoshi Nakamura
166
4
0
26 May 2023
MuAViC: A Multilingual Audio-Visual Corpus for Robust Speech Recognition
  and Robust Speech-to-Text Translation
MuAViC: A Multilingual Audio-Visual Corpus for Robust Speech Recognition and Robust Speech-to-Text TranslationInterspeech (Interspeech), 2023
Mohamed Anwar
Bowen Shi
Vedanuj Goswami
Wei-Ning Hsu
J. Pino
Changhan Wang
256
48
0
01 Mar 2023
Jointly Optimizing Translations and Speech Timing to Improve Isochrony
  in Automatic Dubbing
Jointly Optimizing Translations and Speech Timing to Improve Isochrony in Automatic Dubbing
Alexandra Chronopoulou
Brian Thompson
Prashant Mathur
Yogesh Virkar
Surafel Melaku Lakew
Marcello Federico
194
8
0
25 Feb 2023
Pre-training for Speech Translation: CTC Meets Optimal Transport
Pre-training for Speech Translation: CTC Meets Optimal TransportInternational Conference on Machine Learning (ICML), 2023
Hang Le
Hongyu Gong
Changhan Wang
J. Pino
Benjamin Lecouteux
D. Schwab
OT
416
31
0
27 Jan 2023
SegAugment: Maximizing the Utility of Speech Translation Data with
  Segmentation-based Augmentations
SegAugment: Maximizing the Utility of Speech Translation Data with Segmentation-based AugmentationsConference on Empirical Methods in Natural Language Processing (EMNLP), 2022
Ioannis Tsiamas
José A. R. Fonollosa
Marta R. Costa-jussá
340
6
0
19 Dec 2022
End-to-End Speech Translation of Arabic to English Broadcast News
End-to-End Speech Translation of Arabic to English Broadcast NewsWorkshop on Arabic Natural Language Processing (WANLP), 2022
Fethi Bougares
Salim Jouili
140
0
0
11 Dec 2022
SpeechMatrix: A Large-Scale Mined Corpus of Multilingual
  Speech-to-Speech Translations
SpeechMatrix: A Large-Scale Mined Corpus of Multilingual Speech-to-Speech TranslationsAnnual Meeting of the Association for Computational Linguistics (ACL), 2022
Paul-Ambroise Duquenne
Hongyu Gong
Ning Dong
Jingfei Du
Ann Lee
Vedanuj Goswani
Changhan Wang
J. Pino
Benoît Sagot
Holger Schwenk
296
44
0
08 Nov 2022
Make More of Your Data: Minimal Effort Data Augmentation for Automatic
  Speech Recognition and Translation
Make More of Your Data: Minimal Effort Data Augmentation for Automatic Speech Recognition and TranslationIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2022
Tsz Kin Lam
Shigehiko Schamoni
Stefan Riezler
VLM
339
10
0
27 Oct 2022
Discrete Cross-Modal Alignment Enables Zero-Shot Speech Translation
Discrete Cross-Modal Alignment Enables Zero-Shot Speech TranslationConference on Empirical Methods in Natural Language Processing (EMNLP), 2022
Chen Wang
Yuchen Liu
Boxing Chen
Jiajun Zhang
Wei Luo
Zhongqiang Huang
Chengqing Zong
285
11
0
18 Oct 2022
Bringing NURC/SP to Digital Life: the Role of Open-source Automatic
  Speech Recognition Models
Bringing NURC/SP to Digital Life: the Role of Open-source Automatic Speech Recognition Models
L. Gris
Arnaldo Cândido Júnior
V. G. Santos
B. Dias
Marli Quadros Leite
F. Svartman
S. Aluísio
191
3
0
14 Oct 2022
A High-Quality and Large-Scale Dataset for English-Vietnamese Speech
  Translation
A High-Quality and Large-Scale Dataset for English-Vietnamese Speech TranslationInterspeech (Interspeech), 2022
L. T. Nguyen
Nguyen Luong Tran
Long Doan
Manh Luong
Dat Quoc Nguyen
188
5
0
08 Aug 2022
T-Modules: Translation Modules for Zero-Shot Cross-Modal Machine
  Translation
T-Modules: Translation Modules for Zero-Shot Cross-Modal Machine TranslationConference on Empirical Methods in Natural Language Processing (EMNLP), 2022
Paul-Ambroise Duquenne
Hongyu Gong
Benoît Sagot
Holger Schwenk
258
21
0
24 May 2022
Non-Parametric Domain Adaptation for End-to-End Speech Translation
Non-Parametric Domain Adaptation for End-to-End Speech TranslationConference on Empirical Methods in Natural Language Processing (EMNLP), 2022
Yichao Du
Weizhi Wang
Zhirui Zhang
Boxing Chen
Tong Xu
Jun Xie
Enhong Chen
559
18
0
23 May 2022
SAMU-XLSR: Semantically-Aligned Multimodal Utterance-level Cross-Lingual
  Speech Representation
SAMU-XLSR: Semantically-Aligned Multimodal Utterance-level Cross-Lingual Speech RepresentationIEEE Journal on Selected Topics in Signal Processing (IEEE JSTSP), 2022
Sameer Khurana
Antoine Laurent
James R. Glass
207
46
0
17 May 2022
Efficient yet Competitive Speech Translation: FBK@IWSLT2022
Efficient yet Competitive Speech Translation: FBK@IWSLT2022International Workshop on Spoken Language Translation (IWSLT), 2022
Marco Gaido
Sara Papi
Dennis Fucci
G. Fiameni
Matteo Negri
Marco Turchi
192
20
0
05 May 2022
End-to-End Speech Translation for Code Switched Speech
End-to-End Speech Translation for Code Switched SpeechFindings (Findings), 2022
Orion Weller
Matthias Sperber
Telmo Pires
Hendra Setiawan
Christian Gollan
Dominic Telaar
Matthias Paulik
302
35
0
11 Apr 2022
Hierarchical Softmax for End-to-End Low-resource Multilingual Speech
  Recognition
Hierarchical Softmax for End-to-End Low-resource Multilingual Speech RecognitionIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2022
Qianying Liu
Zhuo Gong
Zhengdong Yang
Yuhang Yang
Sheng Li
...
Nobuaki Minematsu
Hao-Ming Huang
Fei Cheng
Chenhui Chu
Sadao Kurohashi
227
10
0
08 Apr 2022
An Analysis of Semantically-Aligned Speech-Text Embeddings
An Analysis of Semantically-Aligned Speech-Text EmbeddingsSpoken Language Technology Workshop (SLT), 2022
M. Huzaifah
Ivan Kukanov
254
10
0
04 Apr 2022
Prabhupadavani: A Code-mixed Speech Translation Data for 25 Languages
Prabhupadavani: A Code-mixed Speech Translation Data for 25 Languages
Jivnesh Sandhan
Ayush Daksh
Om Adideva Paranjay
Laxmidhar Behera
Pawan Goyal
190
8
0
27 Jan 2022
CVSS Corpus and Massively Multilingual Speech-to-Speech Translation
CVSS Corpus and Massively Multilingual Speech-to-Speech TranslationInternational Conference on Language Resources and Evaluation (LREC), 2022
Yeting Jia
Michelle Tadmor Ramanovich
Quan Wang
Heiga Zen
SLR
349
92
0
11 Jan 2022
CORAA: a large corpus of spontaneous and prepared speech manually
  validated for speech recognition in Brazilian Portuguese
CORAA: a large corpus of spontaneous and prepared speech manually validated for speech recognition in Brazilian Portuguese
Arnaldo Cândido Júnior
Edresson Casanova
A. S. Soares
F. S. Oliveira
L. Oliveira
...
Daniel Peixoto Pinto da Silva
Fernando Gorgulho Fayet
B. Carlotto
L. Gris
S. Aluísio
246
16
0
14 Oct 2021
Is "moby dick" a Whale or a Bird? Named Entities and Terminology in
  Speech Translation
Is "moby dick" a Whale or a Bird? Named Entities and Terminology in Speech Translation
Marco Gaido
Susana Rodríguez
Matteo Negri
L. Bentivogli
Marco Turchi
114
11
0
15 Sep 2021
The HW-TSC's Offline Speech Translation Systems for IWSLT 2021
  Evaluation
The HW-TSC's Offline Speech Translation Systems for IWSLT 2021 Evaluation
Minghan Wang
Yuxia Wang
Yan Yu
Jiaxin Guo
Yingtao Zhang
...
Shimin Tao
Xingshan Zeng
Liangyou Li
Hao Yang
Ying Qin
116
6
0
09 Aug 2021
FST: the FAIR Speech Translation System for the IWSLT21 Multilingual
  Shared Task
FST: the FAIR Speech Translation System for the IWSLT21 Multilingual Shared TaskInternational Workshop on Spoken Language Translation (IWSLT), 2021
Yun Tang
Hongyu Gong
Xian Li
Changhan Wang
J. Pino
Holger Schwenk
Naman Goyal
238
11
0
14 Jul 2021
12
Next
Page 1 of 2