v1v2 (latest)

The Multilingual TEDx Corpus for Speech Recognition and Translation

Interspeech (Interspeech), 2021

2 February 2021

Papers citing "The Multilingual TEDx Corpus for Speech Recognition and Translation"

50 / 76 papers shown

Title
Fast Neural Tangent Kernel Alignment, Norm and Effective Rank via Trace Estimation James Hazelden 64 0 0 13 Nov 2025
Whisper-UT: A Unified Translation Framework for Speech and Text Cihan Xiao Matthew Wiesner Debashish Chakraborty Reno Kriz Keith Cunningham Kenton W. Murray Kevin Duh Luis Tavarez-Arce Paul McNamee Sanjeev Khudanpur 80 0 0 19 Sep 2025
SENSE models: an open source solution for multilingual and multimodal semantic-based tasks Salima Mdhaffar Haroun Elleuch Chaimae Chellaf H. Nguyen Yannick Esteve VLM 112 0 0 15 Sep 2025
NTU Speechlab LLM-Based Multilingual ASR System for Interspeech MLC-SLM Challenge 2025 Yizhou Peng Bin Wang Yi-Wen Chao Ziyang Ma Haoyang Zhang Hexin Liu Xie Chen Eng Siong Chng ELM 205 1 0 16 Jun 2025
Voice Conversion Improves Cross-Domain Robustness for Spoken Arabic Dialect Identification Badr M. Abdullah Matthew Baas Bernd Möbius Dietrich Klakow 106 1 0 30 May 2025
MAVFlow: Preserving Paralinguistic Elements with Conditional Flow Matching for Zero-Shot AV2AV Multilingual Translation Sungwoo Cho J. Choi Sungnyun Kim Se-Young Yun 289 0 0 14 Mar 2025
Connecting Voices: LoReSpeech as a Low-Resource Speech Parallel Corpus Samy Ouzerrout AuLLM 156 0 0 25 Feb 2025
mWhisper-Flamingo for Multilingual Audio-Visual Noise-Robust Speech RecognitionIEEE Signal Processing Letters (IEEE SPL), 2025 Andrew Rouditchenko Saurabhchand Bhati Samuel Thomas Hilde Kuehne Rogerio Feris 483 1 0 03 Feb 2025
Towards Building Large Scale Datasets and State-of-the-Art Automatic Speech Translation Systems for 14 Indian Languages Sparsh Jain Ashwin Sankar Devilal Choudhary Dhairya Suman Nikhil Narasimhan Mohammed Safi Ur Rahman Khan Anoop Kunchukuttan Mitesh M. Khapra Mary Dabre 415 3 0 07 Nov 2024
MOSEL: 950,000 Hours of Speech Data for Open-Source Speech Foundation Model Training on EU LanguagesConference on Empirical Methods in Natural Language Processing (EMNLP), 2024 Marco Gaido Sara Papi L. Bentivogli Alessio Brutti Mauro Cettolo R. Gretter M. Matassoni Mohamed Nabih Matteo Negri 190 12 0 01 Oct 2024
A Large Dataset of Spontaneous Speech with the Accent Spoken in São Paulo for Automatic Speech Recognition EvaluationBrazilian Conference on Intelligent Systems (BRACIS), 2024 Rodrigo Lima S. Leal Arnaldo Candido Junior S. Aluísio 129 2 0 10 Sep 2024
Speech-MASSIVE: A Multilingual Speech Dataset for SLU and BeyondInterspeech (Interspeech), 2024 Beomseok Lee Ioan Calapodescu Marco Gaido Matteo Negri Laurent Besacier AuLLM 220 16 0 07 Aug 2024
Tailored Design of Audio-Visual Speech Recognition Models using Branchformers David Gimeno-Gómez Carlos David Martínez Hinarejos 337 5 0 09 Jul 2024
Towards Robust Speech Representation Learning for Thousands of Languages William Chen Wangyou Zhang Yifan Peng Xinjian Li Jinchuan Tian Jiatong Shi Xuankai Chang Soumi Maiti Karen Livescu Shinji Watanabe ELM 291 42 0 30 Jun 2024
MSR-86K: An Evolving, Multilingual Corpus with 86,300 Hours of Transcribed Audio for Speech Recognition Research Song Li Yongbin You Xuezhi Wang Zhengkun Tian Ke Ding Guanglu Wan 179 10 0 26 Jun 2024
FFSTC: Fongbe to French Speech Translation CorpusInternational Conference on Language Resources and Evaluation (LREC), 2024 D. F. Kponou F. Laleye E. C. Ezin 164 2 0 08 Mar 2024
Where Visual Speech Meets Language: VSP-LLM Framework for Efficient and Context-Aware Visual Speech Processing Jeong Hun Yeo Seunghee Han Minsu Kim Y. Ro 260 31 0 23 Feb 2024
AnnoTheia: A Semi-Automatic Annotation Toolkit for Audio-Visual Speech Technologies José-M. Acosta-Triana David Gimeno-Gómez Carlos David Martínez Hinarejos VLM VGen 265 2 0 20 Feb 2024
Speech Translation with Speech Foundation Models and Large Language Models: What is There and What is Missing? Marco Gaido Sara Papi Matteo Negri L. Bentivogli 404 26 0 19 Feb 2024
A Case Study on Filtering for End-to-End Speech Translation Md Mahfuz Ibn Alam Antonios Anastasopoulos 156 1 0 02 Feb 2024
Efficient Training for Multilingual Visual Speech Recognition: Pre-training with Discretized Visual Speech Representation Minsu Kim Jeong Hun Yeo Se Jin Park J. Choi Y. Ro 225 7 0 18 Jan 2024
AV2AV: Direct Audio-Visual Speech to Audio-Visual Speech Translation with Unified Audio-Visual Speech RepresentationComputer Vision and Pattern Recognition (CVPR), 2023 J. Choi Se Jin Park Minsu Kim Y. Ro 323 16 0 05 Dec 2023
End-to-End Speech-to-Text Translation: A Survey Nivedita Sethiya Chandresh Kumar Maurya 456 13 0 02 Dec 2023
Speaker-Adapted End-to-End Visual Speech Recognition for Continuous SpanishIberSPEECH Conference (IberSPEECH), 2022 David Gimeno-Gómez Carlos David Martínez Hinarejos 131 0 0 21 Nov 2023
On-the-Fly Fusion of Large Language Models and Machine Translation Hieu T. Hoang Huda Khayrallah Marcin Junczys-Dowmunt 234 4 0 14 Nov 2023
Automatic Disfluency Detection from Untranscribed SpeechIEEE/ACM Transactions on Audio Speech and Language Processing (TASLP), 2023 Amrit Romana K. Koishida E. Provost 205 16 0 01 Nov 2023
How To Build Competitive Multi-gender Speech Translation Models For Controlling Speaker Gender Translation Marco Gaido Dennis Fucci Matteo Negri L. Bentivogli 236 2 0 23 Oct 2023
Long-form Simultaneous Speech Translation: Thesis ProposalInternational Joint Conference on Natural Language Processing (IJCNLP), 2023 Peter Polák 3DV 184 3 0 17 Oct 2023
Visual Speech Recognition for Languages with Limited Labeled Data using Automatic Labels from WhisperIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023 Jeong Hun Yeo Minsu Kim Shinji Watanabe Y. Ro VLM 197 16 0 15 Sep 2023
LeBenchmark 2.0: a Standardized, Replicable and Enhanced Framework for Self-supervised Representations of French SpeechComputer Speech and Language (CSL), 2023 Titouan Parcollet H. Nguyen Solène Evain Marcely Zanon Boito Adrien Pupier ... François Portet Solange Rossato Fabien Ringeval D. Schwab Laurent Besacier 240 25 0 11 Sep 2023
Lip Reading for Low-resource Languages by Learning and Combining General Speech Knowledge and Language-specific KnowledgeIEEE International Conference on Computer Vision (ICCV), 2023 Minsu Kim Jeong Hun Yeo J. Choi Y. Ro 164 27 0 18 Aug 2023
End-to-End Evaluation for Low-Latency Simultaneous Speech TranslationConference on Empirical Methods in Natural Language Processing (EMNLP), 2023 Christian Huber Tu Anh Dinh Carlos Mullov Ngoc-Quan Pham Thai-Binh Nguyen ... Danni Liu Zhaolin Li Sai Koneru Jan Niehues A. Waibel 211 10 0 07 Aug 2023
Many-to-Many Spoken Language Translation via Unified Speech and Text Representation Learning with Unit-to-Unit TranslationIEEE/ACM Transactions on Audio Speech and Language Processing (TASLP), 2023 Minsu Kim J. Choi Dahun Kim Y. Ro 174 10 0 03 Aug 2023
Towards cross-language prosody transfer for dialogInterspeech (Interspeech), 2023 Jonathan Avila Nigel G. Ward 202 7 0 09 Jul 2023
HK-LegiCoST: Leveraging Non-Verbatim Transcripts for Speech TranslationInterspeech (Interspeech), 2023 Cihan Xiao Lin Zhang Jinyi Yang Dongji Gao Sanjeev Khudanpur Kevin Duh Sanjeev Khudanpur 208 2 0 20 Jun 2023
NAVER LABS Europe's Multilingual Speech Translation Systems for the IWSLT 2023 Low-Resource TrackInternational Workshop on Spoken Language Translation (IWSLT), 2023 Edward Gow-Smith Alexandre Berard Marcely Zanon Boito Ioan Calapodescu 231 14 0 13 Jun 2023
The Interpreter Understands Your Meaning: End-to-end Spoken Language Understanding Aided by Speech TranslationConference on Empirical Methods in Natural Language Processing (EMNLP), 2023 Mutian He Philip N. Garner 319 5 0 16 May 2023
Learning Cross-lingual Visual Speech RepresentationsIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023 Andreas Zinonos A. Haliassos Pingchuan Ma Stavros Petridis Maja Pantic SSL 126 10 0 14 Mar 2023
MuAViC: A Multilingual Audio-Visual Corpus for Robust Speech Recognition and Robust Speech-to-Text TranslationInterspeech (Interspeech), 2023 Mohamed Anwar Bowen Shi Vedanuj Goswami Wei-Ning Hsu J. Pino Changhan Wang 194 44 0 01 Mar 2023
Efficient CTC Regularization via Coarse Labels for End-to-End Speech TranslationConference of the European Chapter of the Association for Computational Linguistics (EACL), 2023 Biao Zhang Barry Haddow Rico Sennrich 233 3 0 21 Feb 2023
SegAugment: Maximizing the Utility of Speech Translation Data with Segmentation-based AugmentationsConference on Empirical Methods in Natural Language Processing (EMNLP), 2022 Ioannis Tsiamas José A. R. Fonollosa Marta R. Costa-jussá 238 6 0 19 Dec 2022
BLASER: A Text-Free Speech-to-Speech Translation Evaluation MetricAnnual Meeting of the Association for Computational Linguistics (ACL), 2022 Mingda Chen Paul-Ambroise Duquenne Pierre Yves Andrews Justine T. Kao Alexandre Mourachko Holger Schwenk Marta R. Costa-jussá 230 23 0 16 Dec 2022
UnitY: Two-pass Direct Speech-to-speech Translation with Discrete UnitsAnnual Meeting of the Association for Computational Linguistics (ACL), 2022 Hirofumi Inaguma Sravya Popuri Ilia Kulikov Peng-Jen Chen Changhan Wang Yu-An Chung Yun Tang Ann Lee Shinji Watanabe J. Pino 280 75 0 15 Dec 2022
Dialogs Re-enacted Across Languages Nigel G. Ward Jonathan Avila Emilia Rivas Divette Marco 182 2 0 18 Nov 2022
SpeechMatrix: A Large-Scale Mined Corpus of Multilingual Speech-to-Speech TranslationsAnnual Meeting of the Association for Computational Linguistics (ACL), 2022 Paul-Ambroise Duquenne Hongyu Gong Ning Dong Jingfei Du Ann Lee Vedanuj Goswani Changhan Wang J. Pino Benoît Sagot Holger Schwenk 240 44 0 08 Nov 2022
LAMASSU: Streaming Language-Agnostic Multilingual Speech Recognition and Translation Using Neural TransducersInterspeech (Interspeech), 2022 Peidong Wang Eric Sun Jian Xue Yu-Huan Wu Long Zhou Yashesh Gaur Shujie Liu Jinyu Li 328 10 0 05 Nov 2022
Improving Speech-to-Speech Translation Through Unlabeled TextIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2022 Xuan-Phi Nguyen Sravya Popuri Changhan Wang Yun Tang Ilia Kulikov Hongyu Gong 175 9 0 26 Oct 2022
Don't Discard Fixed-Window Audio Segmentation in Speech-to-Text TranslationConference on Machine Translation (WMT), 2022 Chantal Amrhein Barry Haddow 158 10 0 24 Oct 2022
Bringing NURC/SP to Digital Life: the Role of Open-source Automatic Speech Recognition Models L. Gris Arnaldo Cândido Júnior V. G. Santos B. Dias Marli Quadros Leite F. Svartman S. Aluísio 124 3 0 14 Oct 2022
CTC Alignments Improve Autoregressive TranslationConference of the European Chapter of the Association for Computational Linguistics (EACL), 2022 Brian Yan Siddharth Dalmia Yosuke Higuchi Graham Neubig Florian Metze A. Black Shinji Watanabe 173 36 0 11 Oct 2022