Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2002.01320
Cited By
v1
v2 (latest)
CoVoST: A Diverse Multilingual Speech-To-Text Translation Corpus
International Conference on Language Resources and Evaluation (LREC), 2020
4 February 2020
Changhan Wang
J. Pino
Anne Wu
Jiatao Gu
SLR
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"CoVoST: A Diverse Multilingual Speech-To-Text Translation Corpus"
50 / 63 papers shown
Ming-UniAudio: Speech LLM for Joint Understanding, Generation and Editing with Unified Representation
C. Yan
Chunxiang Jin
Dawei Huang
Haibing Yu
Han Peng
...
Yongjie Lyu
Z. He
Zhihao Qiu
Zhiqiang Fang
Ziyuan Huang
AuLLM
446
9
0
26 Oct 2025
InteractiveOmni: A Unified Omni-modal Model for Audio-Visual Multi-turn Dialogue
Wenwen Tong
Hewei Guo
Dongchuan Ran
Jiangnan Chen
Jiefan Lu
...
Dinghao Zhou
Guiping Zhong
Ken Zheng
Shiyin Kang
Lewei Lu
MLLM
AuLLM
VGen
VLM
474
8
0
15 Oct 2025
CS3-Bench: Evaluating and Enhancing Speech-to-Speech LLMs for Mandarin-English Code-Switching
Heyang Liu
Yuhao Wang
Ziyang Cheng
Ronghua Wu
Qunshan Gu
Y. Wang
Yu Wang
127
2
0
09 Oct 2025
SimulSense: Sense-Driven Interpreting for Efficient Simultaneous Speech Translation
Haotian Tan
Hiroki Ouchi
S. Sakti
161
0
0
26 Sep 2025
PART: Progressive Alignment Representation Training for Multilingual Speech-To-Text with LLMs
Pei Zhang
Andong Chen
Xi Chen
Baosong Yang
Yang Li
Fei Huang
136
0
0
24 Sep 2025
Whisper-UT: A Unified Translation Framework for Speech and Text
Cihan Xiao
Matthew Wiesner
Debashish Chakraborty
Reno Kriz
Keith Cunningham
Kenton W. Murray
Kevin Duh
Luis Tavarez-Arce
Paul McNamee
Sanjeev Khudanpur
117
0
0
19 Sep 2025
Self-Improvement for Audio Large Language Model using Unlabeled Speech
S. Wang
Xinyuan Chen
Yao Xu
AuLLM
211
8
0
27 Jul 2025
SeqPO-SiMT: Sequential Policy Optimization for Simultaneous Machine Translation
Annual Meeting of the Association for Computational Linguistics (ACL), 2025
Ting Xu
Zhichao Huang
Jiankai Sun
Shanbo Cheng
Wai Lam
OffRL
408
10
0
27 May 2025
Reshaping Representation Space to Balance the Safety and Over-rejection in Large Audio Language Models
Hao Yang
Zhuang Li
Ehsan Shareghi
Gholamreza Haffari
234
2
0
26 May 2025
Miipher-2: A Universal Speech Restoration Model for Million-Hour Scale Data Restoration
Shigeki Karita
Yuma Koizumi
Heiga Zen
Haruko Ishikawa
Robin Scheibler
M. Bacchiani
VLM
1.1K
5
0
07 May 2025
Mind the Gap! Static and Interactive Evaluations of Large Audio Models
Minzhi Li
William B. Held
Michael Joseph Ryan
Kunat Pipatanakul
Potsawee Manakul
Hao Zhu
Diyi Yang
AuLLM
ALM
245
4
0
21 Feb 2025
Cross-lingual Embedding Clustering for Hierarchical Softmax in Low-Resource Multilingual Speech Recognition
IEEE Transactions on Audio, Speech, and Language Processing (TASLP), 2025
Zhengdong Yang
Qianying Liu
Sheng Li
Fei Cheng
Chenhui Chu
BDL
188
0
0
29 Jan 2025
SPES: Spectrogram Perturbation for Explainable Speech-to-Text Generation
Dennis Fucci
Marco Gaido
Beatrice Savoldi
Matteo Negri
Mauro Cettolo
L. Bentivogli
631
5
0
03 Nov 2024
Towards Achieving Human Parity on End-to-end Simultaneous Speech Translation via LLM Agent
Shanbo Cheng
Zhichao Huang
Tom Ko
Hang Li
Ningxin Peng
Lu Xu
Qini Zhang
366
13
0
31 Jul 2024
NAIST Simultaneous Speech Translation System for IWSLT 2024
Yuka Ko
Ryo Fukuda
Yuta Nishikawa
Yasumasa Kano
Tomoya Yanagita
...
Haotian Tan
Makoto Sakai
S. Sakti
Katsuhito Sudoh
Satoshi Nakamura
355
4
0
30 Jun 2024
LeaPformer: Enabling Linear Transformers for Autoregressive and Simultaneous Tasks via Learned Proportions
International Conference on Machine Learning (ICML), 2024
Victor Agostinelli
Sanghyun Hong
Lizhong Chen
KELM
317
3
0
18 May 2024
FFSTC: Fongbe to French Speech Translation Corpus
International Conference on Language Resources and Evaluation (LREC), 2024
D. F. Kponou
F. Laleye
E. C. Ezin
230
3
0
08 Mar 2024
AIR-Bench: Benchmarking Large Audio-Language Models via Generative Comprehension
Qian Yang
Jin Xu
Wenrui Liu
Yunfei Chu
Ziyue Jiang
...
Yichong Leng
Yuanjun Lv
Zhou Zhao
Chang Zhou
Jingren Zhou
LM&MA
AuLLM
ALM
271
196
0
12 Feb 2024
An Experimental Study: Assessing the Combined Framework of WavLM and BEST-RQ for Text-to-Speech Synthesis
Via Nielson
Steven Hillis
118
0
0
08 Dec 2023
End-to-End Speech-to-Text Translation: A Survey
Nivedita Sethiya
Chandresh Kumar Maurya
557
13
0
02 Dec 2023
End-to-End Single-Channel Speaker-Turn Aware Conversational Speech Translation
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Juan Pablo Zuluaga
Zhaocheng Huang
Xing Niu
Rohit Paturi
S. Srinivasan
Prashant Mathur
Brian Thompson
Marcello Federico
BDL
267
3
0
01 Nov 2023
Sparks of Large Audio Models: A Survey and Outlook
S. Latif
Moazzam Shoukat
Fahad Shamshad
Muhammad Usama
Yi Ren
...
Wenwu Wang
Xulong Zhang
Roberto Togneri
Xiaoshi Zhong
Björn W. Schuller
LM&MA
AuLLM
807
55
0
24 Aug 2023
Recent Advances in Direct Speech-to-text Translation
International Joint Conference on Artificial Intelligence (IJCAI), 2023
Chen Xu
Rong Ye
Qianqian Dong
Chengqi Zhao
Tom Ko
Mingxuan Wang
Tong Xiao
Jingbo Zhu
372
34
0
20 Jun 2023
Tagged End-to-End Simultaneous Speech Translation Training using Simultaneous Interpretation Data
International Workshop on Spoken Language Translation (IWSLT), 2023
Yuka Ko
Ryo Fukuda
Yuta Nishikawa
Yasumasa Kano
Katsuhito Sudoh
Satoshi Nakamura
217
7
0
14 Jun 2023
Improved Cross-Lingual Transfer Learning For Automatic Speech Translation
Sameer Khurana
Nauman Dawalatabad
Antoine Laurent
Luis Vicente
Pablo Gimeno
Victoria Mingote
James R. Glass
VLM
402
2
0
01 Jun 2023
BIG-C: a Multimodal Multi-Purpose Dataset for Bemba
Annual Meeting of the Association for Computational Linguistics (ACL), 2023
Claytone Sikasote
Eunice Mukonde
Md Mahfuz Ibn Alam
Antonios Anastasopoulos
229
9
0
26 May 2023
Inter-connection: Effective Connection between Pre-trained Encoder and Decoder for Speech Translation
Interspeech (Interspeech), 2023
Yuta Nishikawa
Satoshi Nakamura
166
4
0
26 May 2023
MuAViC: A Multilingual Audio-Visual Corpus for Robust Speech Recognition and Robust Speech-to-Text Translation
Interspeech (Interspeech), 2023
Mohamed Anwar
Bowen Shi
Vedanuj Goswami
Wei-Ning Hsu
J. Pino
Changhan Wang
256
48
0
01 Mar 2023
Jointly Optimizing Translations and Speech Timing to Improve Isochrony in Automatic Dubbing
Alexandra Chronopoulou
Brian Thompson
Prashant Mathur
Yogesh Virkar
Surafel Melaku Lakew
Marcello Federico
194
8
0
25 Feb 2023
Pre-training for Speech Translation: CTC Meets Optimal Transport
International Conference on Machine Learning (ICML), 2023
Hang Le
Hongyu Gong
Changhan Wang
J. Pino
Benjamin Lecouteux
D. Schwab
OT
416
31
0
27 Jan 2023
SegAugment: Maximizing the Utility of Speech Translation Data with Segmentation-based Augmentations
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2022
Ioannis Tsiamas
José A. R. Fonollosa
Marta R. Costa-jussá
340
6
0
19 Dec 2022
End-to-End Speech Translation of Arabic to English Broadcast News
Workshop on Arabic Natural Language Processing (WANLP), 2022
Fethi Bougares
Salim Jouili
140
0
0
11 Dec 2022
SpeechMatrix: A Large-Scale Mined Corpus of Multilingual Speech-to-Speech Translations
Annual Meeting of the Association for Computational Linguistics (ACL), 2022
Paul-Ambroise Duquenne
Hongyu Gong
Ning Dong
Jingfei Du
Ann Lee
Vedanuj Goswani
Changhan Wang
J. Pino
Benoît Sagot
Holger Schwenk
296
44
0
08 Nov 2022
Make More of Your Data: Minimal Effort Data Augmentation for Automatic Speech Recognition and Translation
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2022
Tsz Kin Lam
Shigehiko Schamoni
Stefan Riezler
VLM
339
10
0
27 Oct 2022
Discrete Cross-Modal Alignment Enables Zero-Shot Speech Translation
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2022
Chen Wang
Yuchen Liu
Boxing Chen
Jiajun Zhang
Wei Luo
Zhongqiang Huang
Chengqing Zong
285
11
0
18 Oct 2022
Bringing NURC/SP to Digital Life: the Role of Open-source Automatic Speech Recognition Models
L. Gris
Arnaldo Cândido Júnior
V. G. Santos
B. Dias
Marli Quadros Leite
F. Svartman
S. Aluísio
191
3
0
14 Oct 2022
A High-Quality and Large-Scale Dataset for English-Vietnamese Speech Translation
Interspeech (Interspeech), 2022
L. T. Nguyen
Nguyen Luong Tran
Long Doan
Manh Luong
Dat Quoc Nguyen
188
5
0
08 Aug 2022
T-Modules: Translation Modules for Zero-Shot Cross-Modal Machine Translation
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2022
Paul-Ambroise Duquenne
Hongyu Gong
Benoît Sagot
Holger Schwenk
258
21
0
24 May 2022
Non-Parametric Domain Adaptation for End-to-End Speech Translation
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2022
Yichao Du
Weizhi Wang
Zhirui Zhang
Boxing Chen
Tong Xu
Jun Xie
Enhong Chen
559
18
0
23 May 2022
SAMU-XLSR: Semantically-Aligned Multimodal Utterance-level Cross-Lingual Speech Representation
IEEE Journal on Selected Topics in Signal Processing (IEEE JSTSP), 2022
Sameer Khurana
Antoine Laurent
James R. Glass
207
46
0
17 May 2022
Efficient yet Competitive Speech Translation: FBK@IWSLT2022
International Workshop on Spoken Language Translation (IWSLT), 2022
Marco Gaido
Sara Papi
Dennis Fucci
G. Fiameni
Matteo Negri
Marco Turchi
192
20
0
05 May 2022
End-to-End Speech Translation for Code Switched Speech
Findings (Findings), 2022
Orion Weller
Matthias Sperber
Telmo Pires
Hendra Setiawan
Christian Gollan
Dominic Telaar
Matthias Paulik
302
35
0
11 Apr 2022
Hierarchical Softmax for End-to-End Low-resource Multilingual Speech Recognition
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2022
Qianying Liu
Zhuo Gong
Zhengdong Yang
Yuhang Yang
Sheng Li
...
Nobuaki Minematsu
Hao-Ming Huang
Fei Cheng
Chenhui Chu
Sadao Kurohashi
227
10
0
08 Apr 2022
An Analysis of Semantically-Aligned Speech-Text Embeddings
Spoken Language Technology Workshop (SLT), 2022
M. Huzaifah
Ivan Kukanov
254
10
0
04 Apr 2022
Prabhupadavani: A Code-mixed Speech Translation Data for 25 Languages
Jivnesh Sandhan
Ayush Daksh
Om Adideva Paranjay
Laxmidhar Behera
Pawan Goyal
190
8
0
27 Jan 2022
CVSS Corpus and Massively Multilingual Speech-to-Speech Translation
International Conference on Language Resources and Evaluation (LREC), 2022
Yeting Jia
Michelle Tadmor Ramanovich
Quan Wang
Heiga Zen
SLR
349
92
0
11 Jan 2022
CORAA: a large corpus of spontaneous and prepared speech manually validated for speech recognition in Brazilian Portuguese
Arnaldo Cândido Júnior
Edresson Casanova
A. S. Soares
F. S. Oliveira
L. Oliveira
...
Daniel Peixoto Pinto da Silva
Fernando Gorgulho Fayet
B. Carlotto
L. Gris
S. Aluísio
246
16
0
14 Oct 2021
Is "moby dick" a Whale or a Bird? Named Entities and Terminology in Speech Translation
Marco Gaido
Susana Rodríguez
Matteo Negri
L. Bentivogli
Marco Turchi
114
11
0
15 Sep 2021
The HW-TSC's Offline Speech Translation Systems for IWSLT 2021 Evaluation
Minghan Wang
Yuxia Wang
Yan Yu
Jiaxin Guo
Yingtao Zhang
...
Shimin Tao
Xingshan Zeng
Liangyou Li
Hao Yang
Ying Qin
116
6
0
09 Aug 2021
FST: the FAIR Speech Translation System for the IWSLT21 Multilingual Shared Task
International Workshop on Spoken Language Translation (IWSLT), 2021
Yun Tang
Hongyu Gong
Xian Li
Changhan Wang
J. Pino
Holger Schwenk
Naman Goyal
238
11
0
14 Jul 2021
1
2
Next
Page 1 of 2