Unsupervised pretraining transfers well across languages

IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2020

7 February 2020

M. Rivière

Armand Joulin

Pierre-Emmanuel Mazaré

Papers citing "Unsupervised pretraining transfers well across languages"

50 / 120 papers shown

Triadic Multi-party Voice Activity Projection for Turn-taking in Spoken Dialogue Systems

170

10 Jul 2025

Voice Activity Projection Model with Multimodal Encoders

Takeshi Saga

Catherine Pelachaud

216

04 Jun 2025

What do self-supervised speech models know about Dutch? Analyzing advantages of language-specific pre-training

Marianne de Heer Kloots

327

01 Jun 2025

Visual Cues Enhance Predictive Turn-Taking for Two-Party Human InteractionAnnual Meeting of the Association for Computational Linguistics (ACL), 2025

Sam O'Connor Russell

Naomi Harte

264

27 May 2025

Self-supervised learning method using multiple sampling strategies for general-purpose audio representationIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2022

Ibuki Kuroyanagi

Tatsuya Komatsu

SSL

177

25 May 2025

A Noise-Robust Turn-Taking System for Real-World Dialogue Robots: A Field Experiment

244

08 Mar 2025

Yeah, Un, Oh: Continuous and Real-time Backchannel Prediction with Fine-tuning of Voice Activity ProjectionNorth American Chapter of the Association for Computational Linguistics (NAACL), 2024

298

21 Oct 2024

Efficient Training of Self-Supervised Speech Foundation Models on a Compute BudgetSpoken Language Technology Workshop (SLT), 2024

Andy T. Liu

Yi-Cheng Lin

Haibin Wu

Stefan Winkler

Hung-yi Lee

427

09 Sep 2024

On the social bias of speech self-supervised modelsInterspeech (Interspeech), 2024

460

07 Jun 2024

A Large-Scale Evaluation of Speech Foundation Models

...

Shinji Watanabe

Hung-yi Lee

320

15 Apr 2024

Real-time and Continuous Turn-taking Prediction Using Voice Activity Projection

190

10 Jan 2024

Bigger is not Always Better: The Effect of Context Size on Speech Pre-Training

Sean Robertson

Ewan Dunbar

SSL

269

03 Dec 2023

Few-Shot Spoken Language Understanding via Joint Speech-Text ModelsAutomatic Speech Recognition & Understanding (ASRU), 2023

313

09 Oct 2023

Big model only for hard audios: Sample dependent Whisper model selection for efficient inferences

Hugo Malard

Salah Zaiem

Robin Algayres

353

22 Sep 2023

A study on the impact of Self-Supervised Learning on automatic dysarthric speech assessment

Xavier F. Cadet

Ranya Aloufi

S. Ahmadi-Abhari

Hamed Haddadi

172

07 Jun 2023

Improved Cross-Lingual Transfer Learning For Automatic Speech Translation

416

01 Jun 2023

MiniSUPERB: Lightweight Benchmark for Self-supervised Speech ModelsAutomatic Speech Recognition & Understanding (ASRU), 2023

581

30 May 2023

Can Self-Supervised Neural Representations Pre-Trained on Human Speech distinguish Animal Callers?Interspeech (Interspeech), 2023

Eklavya Sarkar

Mathew Magimai.-Doss

293

23 May 2023

Comparison of Multilingual Self-Supervised and Weakly-Supervised Speech Pre-Training for Adaptation to Unseen LanguagesInterspeech (Interspeech), 2023

334

21 May 2023

AfroDigits: A Community-Driven Spoken Digit Dataset for African Languages

...

Douwe Kiela

212

22 Mar 2023

TriAAN-VC: Triple Adaptive Attention Normalization for Any-to-Any Voice ConversionIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023

242

16 Mar 2023

Google USM: Scaling Automatic Speech Recognition Beyond 100 Languages

...

510

366

02 Mar 2023

A Comparison of Speech Data Augmentation Methods Using S3PRL Toolkit

Mina Huh

Ruchira Ray

Corey Karnei

192

27 Feb 2023

Speak, Read and Prompt: High-Fidelity Text-to-Speech with Minimal SupervisionTransactions of the Association for Computational Linguistics (TACL), 2023

Sertan Girgin

Olivier Pietquin

Matthew Sharifi

Marco Tagliasacchi

Neil Zeghidour

241

267

07 Feb 2023

Supervised Acoustic Embeddings And Their Transferability Across LanguagesInternational Conference on Natural Language and Speech Processing (ICNLSP), 2023

Sreepratha Ram

Hanan Aldarmaki

SSL

178

03 Jan 2023

Analysing Discrete Self Supervised Speech Representation for Spoken Language ModelingIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023

Amitay Sicherman

Yossi Adi

324

02 Jan 2023

Disentangling Prosody Representations with Unsupervised Speech ReconstructionIEEE/ACM Transactions on Audio Speech and Language Processing (TASLP), 2022

Leyuan Qu

Taiha Li

C. Weber

Theresa Pekarek-Rosin

F. Ren

S. Wermter

271

14 Dec 2022

ASiT: Local-Global Audio Spectrogram vIsion Transformer for Event ClassificationIEEE/ACM Transactions on Audio Speech and Language Processing (TASLP), 2022

237

23 Nov 2022

Self-supervised learning with bi-label masked speech prediction for streaming multi-talker speech recognitionIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2022

258

10 Nov 2022

Self-Supervised Learning for Speech Enhancement through SynthesisIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2022

217

04 Nov 2022

Losses Can Be Blessings: Routing Self-Supervised Speech Representations Towards Efficient Multilingual and Multitask Speech ProcessingNeural Information Processing Systems (NeurIPS), 2022

Kaizhi Qian

460

02 Nov 2022

Audio Language Modeling using Perceptually-Guided Discrete Representations

Yossi Adi

397

02 Nov 2022

Self-supervised language learning from raw audio: Lessons from the Zero Resource Speech ChallengeIEEE Journal on Selected Topics in Signal Processing (IEEE JSTSP), 2022

272

27 Oct 2022

Full-Stack Bioacoustics: Field Kit to AI to Action (Workshop report)

14 Oct 2022

On the Utility of Self-supervised Models for Prosody-related TasksSpoken Language Technology Workshop (SLT), 2022

Guan-Ting Lin

230

13 Oct 2022

Can we use Common Voice to train a Multi-Speaker TTS system?Spoken Language Technology Workshop (SLT), 2022

Sewade Ogun

Vincent Colotte

Emmanuel Vincent

274

12 Oct 2022

SpeechLM: Enhanced Speech Pre-Training with Unpaired Textual DataIEEE/ACM Transactions on Audio Speech and Language Processing (TASLP), 2022

...

355

30 Sep 2022

Transfer Learning of wav2vec 2.0 for Automatic Lyric TranscriptionInternational Society for Music Information Retrieval Conference (ISMIR), 2022

Longshen Ou

Xiangming Gu

Ye Wang

205

20 Jul 2022

The THUEE System Description for the IARPA OpenASR21 ChallengeInterspeech (Interspeech), 2022

175

29 Jun 2022

RetrieverTTS: Modeling Decomposed Factors for Text-Based Speech InsertionInterspeech (Interspeech), 2022

305

28 Jun 2022

Predicting within and across language phoneme recognition performance of self-supervised learning speech pre-trained models

Han Ji

T. Patel

O. Scharenborg

266

24 Jun 2022

DRAFT: A Novel Framework to Reduce Domain Shifting in Self-supervised Learning and Its Application to Children's ASRInterspeech (Interspeech), 2022

Ruchao Fan

Abeer Alwan

270

16 Jun 2022

Variable-rate hierarchical CPC leads to acoustic unit discovery in speechNeural Information Processing Systems (NeurIPS), 2022

321

05 Jun 2022

Do self-supervised speech models develop human-like perception biases?Annual Meeting of the Association for Computational Linguistics (ACL), 2022

Juliette Millet

Ewan Dunbar

SSL

187

31 May 2022

Joint Training of Speech Enhancement and Self-supervised Model for Noise-robust ASR

236

26 May 2022

Self-Supervised Speech Representation Learning: A ReviewIEEE Journal on Selected Topics in Signal Processing (IEEE JSTSP), 2022

Abdel-rahman Mohamed

Hung-yi Lee

Lasse Borgholt

Jakob Drachmann Havtorn

...

770

471

21 May 2022

Voice Activity Projection: Self-supervised Learning of Turn-taking EventsInterspeech (Interspeech), 2022

Erik Ekstedt

Gabriel Skantze

230

19 May 2022

SAMU-XLSR: Semantically-Aligned Multimodal Utterance-level Cross-Lingual Speech RepresentationIEEE Journal on Selected Topics in Signal Processing (IEEE JSTSP), 2022

Sameer Khurana

Antoine Laurent

James R. Glass

219

17 May 2022

Speech Sequence Embeddings using Nearest Neighbors Contrastive LearningInterspeech (Interspeech), 2022

197

11 Apr 2022

Automatic Data Augmentation Selection and Parametrization in Contrastive Self-Supervised Speech Representation LearningInterspeech (Interspeech), 2022

151

08 Apr 2022