ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2111.09296
  4. Cited By
XLS-R: Self-supervised Cross-lingual Speech Representation Learning at
  Scale

XLS-R: Self-supervised Cross-lingual Speech Representation Learning at Scale

17 November 2021
Arun Babu
Changhan Wang
Andros Tjandra
Kushal Lakhotia
Qiantong Xu
Naman Goyal
Kritika Singh
Patrick von Platen
Yatharth Saraf
J. Pino
Alexei Baevski
Alexis Conneau
Michael Auli
    SSL
ArXivPDFHTML

Papers citing "XLS-R: Self-supervised Cross-lingual Speech Representation Learning at Scale"

50 / 113 papers shown
Title
Voila: Voice-Language Foundation Models for Real-Time Autonomous Interaction and Voice Role-Play
Voila: Voice-Language Foundation Models for Real-Time Autonomous Interaction and Voice Role-Play
Yemin Shi
Yu Shu
Siwei Dong
Guangyi Liu
Jaward Sesay
Jingwen Li
Zhiting Hu
AuLLM
VLM
50
0
0
05 May 2025
Language translation, and change of accent for speech-to-speech task using diffusion model
Language translation, and change of accent for speech-to-speech task using diffusion model
Abhishek Mishra
Ritesh Sur Chowdhury
Vartul Bahuguna
Isha Pandey
Ganesh Ramakrishnan
DiffM
44
0
0
04 May 2025
Weakly-supervised Audio Temporal Forgery Localization via Progressive Audio-language Co-learning Network
Weakly-supervised Audio Temporal Forgery Localization via Progressive Audio-language Co-learning Network
Junyan Wu
Wenbo Xu
Wei Lu
Xiangyang Luo
Rui Yang
Shize Guo
34
0
0
03 May 2025
Spatial Speech Translation: Translating Across Space With Binaural Hearables
Spatial Speech Translation: Translating Across Space With Binaural Hearables
Tuochao Chen
Qirui Wang
Runlin He
Shyam Gollakota
31
0
0
25 Apr 2025
Less is More for Synthetic Speech Detection in the Wild
Less is More for Synthetic Speech Detection in the Wild
Ashi Garg
Zexin Cai
Henry Li Xinyuan
Leibny Paola García-Perera
Kevin Duh
Sanjeev Khudanpur
Matthew Wiesner
Nicholas Andrews
74
0
0
17 Feb 2025
Comprehensive Layer-wise Analysis of SSL Models for Audio Deepfake Detection
Comprehensive Layer-wise Analysis of SSL Models for Audio Deepfake Detection
Yassine El Kheir
Youness Samih
Suraj Maharjan
Tim Polzehl
Sebastian Möller
73
1
0
05 Feb 2025
mWhisper-Flamingo for Multilingual Audio-Visual Noise-Robust Speech Recognition
mWhisper-Flamingo for Multilingual Audio-Visual Noise-Robust Speech Recognition
Andrew Rouditchenko
Saurabhchand Bhati
Samuel Thomas
Hilde Kuehne
Rogerio Feris
111
1
0
03 Feb 2025
Noise-Agnostic Multitask Whisper Training for Reducing False Alarm Errors in Call-for-Help Detection
Noise-Agnostic Multitask Whisper Training for Reducing False Alarm Errors in Call-for-Help Detection
Myeonghoon Ryu
June-Woo Kim
Minseok Oh
Suji Lee
Han Park
41
0
0
20 Jan 2025
Joint Automatic Speech Recognition And Structure Learning For Better Speech Understanding
Joint Automatic Speech Recognition And Structure Learning For Better Speech Understanding
J. Hu
Zuchao Li
Mengjia Shen
Haojun Ai
Sheng Li
Jun Zhang
31
0
0
20 Jan 2025
Discrete Speech Unit Extraction via Independent Component Analysis
Discrete Speech Unit Extraction via Independent Component Analysis
Tomohiko Nakamura
Kwanghee Choi
Keigo Hojo
Yoshiaki Bando
Satoru Fukayama
Shinji Watanabe
43
0
0
11 Jan 2025
Speech Recognition for Automatically Assessing Afrikaans and isiXhosa Preschool Oral Narratives
Speech Recognition for Automatically Assessing Afrikaans and isiXhosa Preschool Oral Narratives
C. Jacobs
Annelien Smith
Daleen Klop
Ondřej Klejch
Febe de Wet
Herman Kamper
49
0
0
11 Jan 2025
Towards Unsupervised Speech Recognition Without Pronunciation Models
Towards Unsupervised Speech Recognition Without Pronunciation Models
Junrui Ni
Liming Wang
Yang Zhang
Kaizhi Qian
Heting Gao
Mark Hasegawa-Johnson
Chang-Dong Yoo
SSL
OffRL
86
0
0
10 Jan 2025
AccentBox: Towards High-Fidelity Zero-Shot Accent Generation
AccentBox: Towards High-Fidelity Zero-Shot Accent Generation
Jinzuomu Zhong
Korin Richmond
Zhiba Su
Siqi Sun
55
4
0
10 Jan 2025
How to Learn a New Language? An Efficient Solution for Self-Supervised Learning Models Unseen Languages Adaption in Low-Resource Scenario
How to Learn a New Language? An Efficient Solution for Self-Supervised Learning Models Unseen Languages Adaption in Low-Resource Scenario
Shih-Heng Wang
Zih-Ching Chen
Jiatong Shi
Ming To Chuang
Guan-Ting Lin
Kuan Po Huang
David F. Harwath
Shang-Wen Li
Hung-yi Lee
78
1
0
27 Nov 2024
The First VoicePrivacy Attacker Challenge Evaluation Plan
The First VoicePrivacy Attacker Challenge Evaluation Plan
N. Tomashenko
Xiaoxiao Miao
Emmanuel Vincent
Junichi Yamagishi
125
2
0
09 Oct 2024
Efficiently Identifying Low-Quality Language Subsets in Multilingual
  Datasets: A Case Study on a Large-Scale Multilingual Audio Dataset
Efficiently Identifying Low-Quality Language Subsets in Multilingual Datasets: A Case Study on a Large-Scale Multilingual Audio Dataset
Farhan Samir
Emily P. Ahn
Shreya Prakash
Márton Soskuthy
Vered Shwartz
Jian Zhu
26
0
0
05 Oct 2024
Weighted Cross-entropy for Low-Resource Languages in Multilingual Speech
  Recognition
Weighted Cross-entropy for Low-Resource Languages in Multilingual Speech Recognition
Andrés Piñeiro-Martín
C. García-Mateo
Laura Docío-Fernández
María del Carmen López-Pérez
Georg Rehm
32
3
0
25 Sep 2024
A Comprehensive Survey with Critical Analysis for Deepfake Speech Detection
A Comprehensive Survey with Critical Analysis for Deepfake Speech Detection
Lam Pham
Phat Lam
Dat Tran
Hieu Tang
Tin Nguyen
Alexander Schindler
Canh Vu
Alexander Polonsky
Canh Vu
51
3
0
23 Sep 2024
What is lost in Normalization? Exploring Pitfalls in Multilingual ASR
  Model Evaluations
What is lost in Normalization? Exploring Pitfalls in Multilingual ASR Model Evaluations
Kavya Manohar
Leena G Pillai
29
3
0
04 Sep 2024
Towards scalable efficient on-device ASR with transfer learning
Towards scalable efficient on-device ASR with transfer learning
Laxmi Pandey
Ke Li
Jinxi Guo
Debjyoti Paul
Arthur Guo
Jay Mahadeokar
Xuedong Zhang
31
2
0
23 Jul 2024
Cross-Lingual Transfer Learning for Speech Translation
Cross-Lingual Transfer Learning for Speech Translation
Rao Ma
Yassir Fathullah
Mengjie Qian
Siyuan Tang
Mark J. F. Gales
Kate Knill
20
1
0
01 Jul 2024
SEACrowd: A Multilingual Multimodal Data Hub and Benchmark Suite for Southeast Asian Languages
SEACrowd: A Multilingual Multimodal Data Hub and Benchmark Suite for Southeast Asian Languages
Holy Lovenia
Rahmad Mahendra
Salsabil Maulana Akbar
Lester James Validad Miranda
Jennifer Santoso
...
Genta Indra Winata
Ruochen Zhang
Fajri Koto
Zheng-Xin Yong
Samuel Cahyawijaya
84
9
0
14 Jun 2024
Self-Supervised Speech Representations are More Phonetic than Semantic
Self-Supervised Speech Representations are More Phonetic than Semantic
Kwanghee Choi
Ankita Pasad
Tomohiko Nakamura
Satoru Fukayama
Karen Livescu
Shinji Watanabe
29
14
0
12 Jun 2024
Reading Miscue Detection in Primary School through Automatic Speech
  Recognition
Reading Miscue Detection in Primary School through Automatic Speech Recognition
Lingyun Gao
Cristian Tejedor-García
H. Strik
C. Cucchiarini
32
0
0
11 Jun 2024
Discrete Multimodal Transformers with a Pretrained Large Language Model
  for Mixed-Supervision Speech Processing
Discrete Multimodal Transformers with a Pretrained Large Language Model for Mixed-Supervision Speech Processing
V. Trinh
Rosy Southwell
Yiwen Guan
Xinlu He
Zhiyong Wang
Jacob Whitehill
OffRL
36
2
0
04 Jun 2024
Textless Acoustic Model with Self-Supervised Distillation for
  Noise-Robust Expressive Speech-to-Speech Translation
Textless Acoustic Model with Self-Supervised Distillation for Noise-Robust Expressive Speech-to-Speech Translation
Min-Jae Hwang
Ilia Kulikov
Benjamin Peloquin
Hongyu Gong
Peng-Jen Chen
Ann Lee
27
1
0
04 Jun 2024
Whistle: Data-Efficient Multilingual and Crosslingual Speech Recognition via Weakly Phonetic Supervision
Whistle: Data-Efficient Multilingual and Crosslingual Speech Recognition via Weakly Phonetic Supervision
Saierdaer Yusuyin
Te Ma
Hao Huang
Wenbo Zhao
Zhijian Ou
44
2
0
04 Jun 2024
HAFFormer: A Hierarchical Attention-Free Framework for Alzheimer's
  Disease Detection From Spontaneous Speech
HAFFormer: A Hierarchical Attention-Free Framework for Alzheimer's Disease Detection From Spontaneous Speech
Zhongren Dong
Zixing Zhang
Weixiang Xu
Jing Han
Jianjun Ou
Björn W. Schuller
40
1
0
07 May 2024
RepAugment: Input-Agnostic Representation-Level Augmentation for
  Respiratory Sound Classification
RepAugment: Input-Agnostic Representation-Level Augmentation for Respiratory Sound Classification
June-Woo Kim
Miika Toikkanen
Sangmin Bae
Minseok Kim
Ho-Young Jung
30
5
0
05 May 2024
Combining X-Vectors and Bayesian Batch Active Learning: Two-Stage Active Learning Pipeline for Speech Recognition
Combining X-Vectors and Bayesian Batch Active Learning: Two-Stage Active Learning Pipeline for Speech Recognition
O. Kundacina
V. Vincan
D. Mišković
BDL
101
0
0
03 May 2024
Low-resource speech recognition and dialect identification of Irish in a
  multi-task framework
Low-resource speech recognition and dialect identification of Irish in a multi-task framework
Liam Lonergan
Mengjie Qian
Neasa Ní Chiaráin
Christer Gobl
A. N. Chasaide
38
2
0
02 May 2024
Audio Anti-Spoofing Detection: A Survey
Audio Anti-Spoofing Detection: A Survey
Menglu Li
Yasaman Ahmadiadli
Xiao-Ping Zhang
46
17
0
22 Apr 2024
The VoicePrivacy 2024 Challenge Evaluation Plan
The VoicePrivacy 2024 Challenge Evaluation Plan
N. Tomashenko
Xiaoxiao Miao
Pierre Champion
Sarina Meyer
Xin Wang
Emmanuel Vincent
Michele Panariello
Nicholas W. D. Evans
Junichi Yamagishi
Massimiliano Todisco
36
21
0
03 Apr 2024
A Comprehensive Review of Machine Learning Advances on Data Change: A
  Cross-Field Perspective
A Comprehensive Review of Machine Learning Advances on Data Change: A Cross-Field Perspective
Jeng-Lin Li
Chih-Fan Hsu
Ming-Ching Chang
Wei-Chao Chen
OOD
44
2
0
20 Feb 2024
Establishing degrees of closeness between audio recordings along
  different dimensions using large-scale cross-lingual models
Establishing degrees of closeness between audio recordings along different dimensions using large-scale cross-lingual models
Maxime Fily
Guillaume Wisniewski
Severine Guillaume
Gilles Adda
Alexis Michaud
22
1
0
08 Feb 2024
Efficient Adapter Finetuning for Tail Languages in Streaming
  Multilingual ASR
Efficient Adapter Finetuning for Tail Languages in Streaming Multilingual ASR
Junwen Bai
Bo-wen Li
Qiujia Li
Tara N. Sainath
Trevor Strohman
30
3
0
17 Jan 2024
Efficiency-oriented approaches for self-supervised speech representation
  learning
Efficiency-oriented approaches for self-supervised speech representation learning
Luis Lugo
Valentin Vielzeuf
SSL
26
1
0
18 Dec 2023
The taste of IPA: Towards open-vocabulary keyword spotting and forced
  alignment in any language
The taste of IPA: Towards open-vocabulary keyword spotting and forced alignment in any language
Jian Zhu
Changbing Yang
Farhan Samir
Jahurul Islam
32
4
0
14 Nov 2023
Diff-HierVC: Diffusion-based Hierarchical Voice Conversion with Robust
  Pitch Generation and Masked Prior for Zero-shot Speaker Adaptation
Diff-HierVC: Diffusion-based Hierarchical Voice Conversion with Robust Pitch Generation and Masked Prior for Zero-shot Speaker Adaptation
Haram Choi
Sang-Hoon Lee
Seong-Whan Lee
DiffM
21
24
0
08 Nov 2023
End-to-End Single-Channel Speaker-Turn Aware Conversational Speech
  Translation
End-to-End Single-Channel Speaker-Turn Aware Conversational Speech Translation
Juan Pablo Zuluaga
Zhaocheng Huang
Xing Niu
Rohit Paturi
S. Srinivasan
Prashant Mathur
Brian Thompson
Marcello Federico
BDL
27
2
0
01 Nov 2023
A Systematic Study of Performance Disparities in Multilingual
  Task-Oriented Dialogue Systems
A Systematic Study of Performance Disparities in Multilingual Task-Oriented Dialogue Systems
Songbo Hu
Han Zhou
Moy Yuan
Milan Gritta
Guchun Zhang
Ignacio Iacobacci
Anna Korhonen
Ivan Vulić
30
3
0
19 Oct 2023
Optimized Tokenization for Transcribed Error Correction
Optimized Tokenization for Transcribed Error Correction
Tomer Wullach
Shlomo E. Chazan
24
0
0
16 Oct 2023
Leveraging Multilingual Self-Supervised Pretrained Models for
  Sequence-to-Sequence End-to-End Spoken Language Understanding
Leveraging Multilingual Self-Supervised Pretrained Models for Sequence-to-Sequence End-to-End Spoken Language Understanding
Pavel Denisov
Ngoc Thang Vu
29
1
0
09 Oct 2023
Findings of the 2023 ML-SUPERB Challenge: Pre-Training and Evaluation over More Languages and Beyond
Findings of the 2023 ML-SUPERB Challenge: Pre-Training and Evaluation over More Languages and Beyond
Jiatong Shi
William Chen
Dan Berrebbi
Hsiu-Hsuan Wang
Wei-Ping Huang
...
Yuxun Tang
Shang-Wen Li
Abdelrahman Mohamed
Hung-yi Lee
Shinji Watanabe
LRM
ELM
34
15
0
09 Oct 2023
XLS-R fine-tuning on noisy word boundaries for unsupervised speech
  segmentation into words
XLS-R fine-tuning on noisy word boundaries for unsupervised speech segmentation into words
Robin Algayres
Pablo Diego-Simon
Benoît Sagot
Emmanuel Dupoux
36
1
0
08 Oct 2023
Multimodal Modeling For Spoken Language Identification
Multimodal Modeling For Spoken Language Identification
Shikhar Bharadwaj
Min Ma
Shikhar Vashishth
Ankur Bapna
Sriram Ganapathy
...
Yu Zhang
D. Esch
Sandy Ritchie
Partha P. Talukdar
Jason Riesa
30
0
0
19 Sep 2023
End-to-End Evaluation for Low-Latency Simultaneous Speech Translation
End-to-End Evaluation for Low-Latency Simultaneous Speech Translation
Christian Huber
Tu Anh Dinh
Carlos Mullov
Ngoc-Quan Pham
Thai-Binh Nguyen
...
Danni Liu
Zhaolin Li
Sai Koneru
J. Niehues
A. Waibel
28
3
0
07 Aug 2023
ÌròyìnSpeech: A multi-purpose Yorùbá Speech Corpus
ÌròyìnSpeech: A multi-purpose Yorùbá Speech Corpus
Tolúlopé Ògúnrèmí
Kólá Túbosún
Aremu Anuoluwapo
Iroro Orife
David Ifeoluwa Adelani
34
6
0
29 Jul 2023
Towards dialect-inclusive recognition in a low-resource language: are
  balanced corpora the answer?
Towards dialect-inclusive recognition in a low-resource language: are balanced corpora the answer?
Liam Lonergan
Mengjie Qian
Neasa Ní Chiaráin
Christer Gobl
A. N. Chasaide
16
6
0
14 Jul 2023
Replay to Remember: Continual Layer-Specific Fine-tuning for German
  Speech Recognition
Replay to Remember: Continual Layer-Specific Fine-tuning for German Speech Recognition
Theresa Pekarek-Rosin
S. Wermter
VLM
CLL
24
2
0
14 Jul 2023
123
Next