Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2002.02848
Cited By
Unsupervised pretraining transfers well across languages
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2020
7 February 2020
M. Rivière
Armand Joulin
Pierre-Emmanuel Mazaré
Emmanuel Dupoux
SSL
VLM
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Unsupervised pretraining transfers well across languages"
50 / 120 papers shown
Triadic Multi-party Voice Activity Projection for Turn-taking in Spoken Dialogue Systems
Mikey Elmers
K. Inoue
Divesh Lala
Tatsuya Kawahara
170
0
0
10 Jul 2025
Voice Activity Projection Model with Multimodal Encoders
Takeshi Saga
Catherine Pelachaud
216
2
0
04 Jun 2025
What do self-supervised speech models know about Dutch? Analyzing advantages of language-specific pre-training
Marianne de Heer Kloots
Hosein Mohebbi
Charlotte Pouw
Gaofei Shen
Willem H. Zuidema
Martijn Bentum
SSL
327
1
0
01 Jun 2025
Visual Cues Enhance Predictive Turn-Taking for Two-Party Human Interaction
Annual Meeting of the Association for Computational Linguistics (ACL), 2025
Sam O'Connor Russell
Naomi Harte
264
2
0
27 May 2025
Self-supervised learning method using multiple sampling strategies for general-purpose audio representation
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2022
Ibuki Kuroyanagi
Tatsuya Komatsu
SSL
177
2
0
25 May 2025
A Noise-Robust Turn-Taking System for Real-World Dialogue Robots: A Field Experiment
K. Inoue
Yuki Okafuji
Jun Baba
Yoshiki Ohira
Katsuya Hyodo
Tatsuya Kawahara
244
3
0
08 Mar 2025
Yeah, Un, Oh: Continuous and Real-time Backchannel Prediction with Fine-tuning of Voice Activity Projection
North American Chapter of the Association for Computational Linguistics (NAACL), 2024
K. Inoue
Divesh Lala
Gabriel Skantze
Tatsuya Kawahara
298
12
0
21 Oct 2024
Efficient Training of Self-Supervised Speech Foundation Models on a Compute Budget
Spoken Language Technology Workshop (SLT), 2024
Andy T. Liu
Yi-Cheng Lin
Haibin Wu
Stefan Winkler
Hung-yi Lee
427
4
0
09 Sep 2024
On the social bias of speech self-supervised models
Interspeech (Interspeech), 2024
Yi-Cheng Lin
Tzu-Quan Lin
Hsi-Che Lin
Andy T. Liu
Hung-yi Lee
460
12
0
07 Jun 2024
A Large-Scale Evaluation of Speech Foundation Models
Shu-Wen Yang
Heng-Jui Chang
Zili Huang
Andy T. Liu
Cheng-I Jeff Lai
...
Kushal Lakhotia
Shang-Wen Li
Abdelrahman Mohamed
Shinji Watanabe
Hung-yi Lee
320
62
0
15 Apr 2024
Real-time and Continuous Turn-taking Prediction Using Voice Activity Projection
K. Inoue
Bing’er Jiang
Erik Ekstedt
Tatsuya Kawahara
Gabriel Skantze
190
22
0
10 Jan 2024
Bigger is not Always Better: The Effect of Context Size on Speech Pre-Training
Sean Robertson
Ewan Dunbar
SSL
269
1
0
03 Dec 2023
Few-Shot Spoken Language Understanding via Joint Speech-Text Models
Automatic Speech Recognition & Understanding (ASRU), 2023
Chung-Ming Chien
Mingjiamei Zhang
Ju-Chieh Chou
Karen Livescu
313
6
0
09 Oct 2023
Big model only for hard audios: Sample dependent Whisper model selection for efficient inferences
Hugo Malard
Salah Zaiem
Robin Algayres
353
2
0
22 Sep 2023
A study on the impact of Self-Supervised Learning on automatic dysarthric speech assessment
Xavier F. Cadet
Ranya Aloufi
S. Ahmadi-Abhari
Hamed Haddadi
172
8
0
07 Jun 2023
Improved Cross-Lingual Transfer Learning For Automatic Speech Translation
Sameer Khurana
Nauman Dawalatabad
Antoine Laurent
Luis Vicente
Pablo Gimeno
Victoria Mingote
James R. Glass
VLM
416
2
0
01 Jun 2023
MiniSUPERB: Lightweight Benchmark for Self-supervised Speech Models
Automatic Speech Recognition & Understanding (ASRU), 2023
Yu-Hsiang Wang
Huan Chen
Kai-Wei Chang
Winston H. Hsu
Hung-yi Lee
581
8
0
30 May 2023
Can Self-Supervised Neural Representations Pre-Trained on Human Speech distinguish Animal Callers?
Interspeech (Interspeech), 2023
Eklavya Sarkar
Mathew Magimai.-Doss
293
18
0
23 May 2023
Comparison of Multilingual Self-Supervised and Weakly-Supervised Speech Pre-Training for Adaptation to Unseen Languages
Interspeech (Interspeech), 2023
Andrew Rouditchenko
Sameer Khurana
Samuel Thomas
Rogerio Feris
Leonid Karlinsky
Hilde Kuehne
David Harwath
Brian Kingsbury
James R. Glass
VLM
334
26
0
21 May 2023
AfroDigits: A Community-Driven Spoken Digit Dataset for African Languages
Chris C. Emezue
Sanchit Gandhi
Lewis Tunstall
Abubakar Abid
Josh Meyer
...
Douwe Kiela
Yacine Jernite
Julien Chaumond
Merve Noyan
Omar Sanseviero
212
4
0
22 Mar 2023
TriAAN-VC: Triple Adaptive Attention Normalization for Any-to-Any Voice Conversion
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023
Hyun Joon Park
Seok Woo Yang
Jin Sob Kim
Wooseok Shin
S. W. Han
242
29
0
16 Mar 2023
Google USM: Scaling Automatic Speech Recognition Beyond 100 Languages
Yu Zhang
Wei Han
James Qin
Yongqiang Wang
Ankur Bapna
...
Pedro J. Moreno
Chung-Cheng Chiu
J. Schalkwyk
Franccoise Beaufays
Yonghui Wu
VLM
510
366
0
02 Mar 2023
A Comparison of Speech Data Augmentation Methods Using S3PRL Toolkit
Mina Huh
Ruchira Ray
Corey Karnei
192
7
0
27 Feb 2023
Speak, Read and Prompt: High-Fidelity Text-to-Speech with Minimal Supervision
Transactions of the Association for Computational Linguistics (TACL), 2023
Eugene Kharitonov
Damien Vincent
Zalan Borsos
Raphaël Marinier
Sertan Girgin
Olivier Pietquin
Matthew Sharifi
Marco Tagliasacchi
Neil Zeghidour
241
267
0
07 Feb 2023
Supervised Acoustic Embeddings And Their Transferability Across Languages
International Conference on Natural Language and Speech Processing (ICNLSP), 2023
Sreepratha Ram
Hanan Aldarmaki
SSL
178
4
0
03 Jan 2023
Analysing Discrete Self Supervised Speech Representation for Spoken Language Modeling
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023
Amitay Sicherman
Yossi Adi
324
57
0
02 Jan 2023
Disentangling Prosody Representations with Unsupervised Speech Reconstruction
IEEE/ACM Transactions on Audio Speech and Language Processing (TASLP), 2022
Leyuan Qu
Taiha Li
C. Weber
Theresa Pekarek-Rosin
F. Ren
S. Wermter
271
17
0
14 Dec 2022
ASiT: Local-Global Audio Spectrogram vIsion Transformer for Event Classification
IEEE/ACM Transactions on Audio Speech and Language Processing (TASLP), 2022
Sara Atito
Muhammad Awais
Wenwu Wang
Mark D. Plumbley
J. Kittler
ViT
237
22
0
23 Nov 2022
Self-supervised learning with bi-label masked speech prediction for streaming multi-talker speech recognition
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2022
Zili Huang
Zhuo Chen
Naoyuki Kanda
Jian Wu
Yiming Wang
Jinyu Li
Takuya Yoshioka
Xiaofei Wang
Peidong Wang
258
4
0
10 Nov 2022
Self-Supervised Learning for Speech Enhancement through Synthesis
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2022
Bryce Irvin
Marko Stamenovic
M. Kegler
Li-Chia Yang
217
26
0
04 Nov 2022
Losses Can Be Blessings: Routing Self-Supervised Speech Representations Towards Efficient Multilingual and Multitask Speech Processing
Neural Information Processing Systems (NeurIPS), 2022
Yonggan Fu
Yang Zhang
Kaizhi Qian
Zhifan Ye
Zhongzhi Yu
Cheng-I Jeff Lai
Yingyan Lin
460
10
0
02 Nov 2022
Audio Language Modeling using Perceptually-Guided Discrete Representations
Felix Kreuk
Yaniv Taigman
Adam Polyak
Jade Copet
Gabriel Synnaeve
Alexandre Défossez
Yossi Adi
397
5
0
02 Nov 2022
Self-supervised language learning from raw audio: Lessons from the Zero Resource Speech Challenge
IEEE Journal on Selected Topics in Signal Processing (IEEE JSTSP), 2022
Ewan Dunbar
Nicolas Hamilakis
Emmanuel Dupoux
SSL
272
43
0
27 Oct 2022
Full-Stack Bioacoustics: Field Kit to AI to Action (Workshop report)
Dana T Stowell
Caitlin M. Black
Florencia Noriega
Sarab S. Sethi
70
0
0
14 Oct 2022
On the Utility of Self-supervised Models for Prosody-related Tasks
Spoken Language Technology Workshop (SLT), 2022
Guan-Ting Lin
Chiyu Feng
Wei-Ping Huang
Yuan Tseng
Tzu-Han Lin
Chen-An Li
Hung-yi Lee
Nigel G. Ward
230
63
0
13 Oct 2022
Can we use Common Voice to train a Multi-Speaker TTS system?
Spoken Language Technology Workshop (SLT), 2022
Sewade Ogun
Vincent Colotte
Emmanuel Vincent
274
13
0
12 Oct 2022
SpeechLM: Enhanced Speech Pre-Training with Unpaired Textual Data
IEEE/ACM Transactions on Audio Speech and Language Processing (TASLP), 2022
Zi-Hua Zhang
Sanyuan Chen
Long Zhou
Yu Wu
Shuo Ren
...
Zhuoyuan Yao
Xun Gong
Lirong Dai
Jinyu Li
Furu Wei
355
71
0
30 Sep 2022
Transfer Learning of wav2vec 2.0 for Automatic Lyric Transcription
International Society for Music Information Retrieval Conference (ISMIR), 2022
Longshen Ou
Xiangming Gu
Ye Wang
205
26
0
20 Jul 2022
The THUEE System Description for the IARPA OpenASR21 Challenge
Interspeech (Interspeech), 2022
Jing Zhao
Haoyu Wang
Jinpeng Li
Shuzhou Chai
Guan-Bo Wang
Guoguo Chen
Weiqiang Zhang
VLM
175
1
0
29 Jun 2022
RetrieverTTS: Modeling Decomposed Factors for Text-Based Speech Insertion
Interspeech (Interspeech), 2022
Dacheng Yin
Chuanxin Tang
Yanqing Liu
Xiaoqiang Wang
Zhiyuan Zhao
Yucheng Zhao
Zhiwei Xiong
Sheng Zhao
Chong Luo
305
14
0
28 Jun 2022
Predicting within and across language phoneme recognition performance of self-supervised learning speech pre-trained models
Han Ji
T. Patel
O. Scharenborg
266
10
0
24 Jun 2022
DRAFT: A Novel Framework to Reduce Domain Shifting in Self-supervised Learning and Its Application to Children's ASR
Interspeech (Interspeech), 2022
Ruchao Fan
Abeer Alwan
270
39
0
16 Jun 2022
Variable-rate hierarchical CPC leads to acoustic unit discovery in speech
Neural Information Processing Systems (NeurIPS), 2022
Santiago Cuervo
Adrian Lañcucki
R. Marxer
Paweł Rychlikowski
J. Chorowski
SSL
321
18
0
05 Jun 2022
Do self-supervised speech models develop human-like perception biases?
Annual Meeting of the Association for Computational Linguistics (ACL), 2022
Juliette Millet
Ewan Dunbar
SSL
187
25
0
31 May 2022
Joint Training of Speech Enhancement and Self-supervised Model for Noise-robust ASR
Qiu-shi Zhu
Jie Zhang
Zitian Zhang
Lirong Dai
236
18
0
26 May 2022
Self-Supervised Speech Representation Learning: A Review
IEEE Journal on Selected Topics in Signal Processing (IEEE JSTSP), 2022
Abdel-rahman Mohamed
Hung-yi Lee
Lasse Borgholt
Jakob Drachmann Havtorn
Joakim Edin
...
Shang-Wen Li
Karen Livescu
Lars Maaløe
Tara N. Sainath
Shinji Watanabe
SSL
AI4TS
770
471
0
21 May 2022
Voice Activity Projection: Self-supervised Learning of Turn-taking Events
Interspeech (Interspeech), 2022
Erik Ekstedt
Gabriel Skantze
230
63
0
19 May 2022
SAMU-XLSR: Semantically-Aligned Multimodal Utterance-level Cross-Lingual Speech Representation
IEEE Journal on Selected Topics in Signal Processing (IEEE JSTSP), 2022
Sameer Khurana
Antoine Laurent
James R. Glass
219
46
0
17 May 2022
Speech Sequence Embeddings using Nearest Neighbors Contrastive Learning
Interspeech (Interspeech), 2022
Algayres Robin
Adel Nabli
Benoît Sagot
Emmanuel Dupoux
SSL
197
9
0
11 Apr 2022
Automatic Data Augmentation Selection and Parametrization in Contrastive Self-Supervised Speech Representation Learning
Interspeech (Interspeech), 2022
Salah Zaiem
Titouan Parcollet
S. Essid
SSL
151
8
0
08 Apr 2022
1
2
3
Next
Page 1 of 3