v1v2 (latest)

Multi-task self-supervised learning for Robust Speech Recognition

IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2020

25 January 2020

Mirco Ravanelli

Papers citing "Multi-task self-supervised learning for Robust Speech Recognition"

50 / 167 papers shown

Noisy Disentanglement with Tri-stage Training for Noise-Robust Speech Recognition

197

01 Sep 2025

Model Unmerging: Making Your Models Unmergeable for Secure Model Sharing

197

01 Sep 2025

Audio-Visual Representation Learning via Knowledge Distillation from Speech Foundation ModelsPattern Recognition (Pattern Recogn.), 2025

349

09 Feb 2025

LLM supervised Pre-training for Multimodal Emotion Recognition in ConversationsIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2025

Soumya Dutta

Sriram Ganapathy

372

20 Jan 2025

SurgeryV2: Bridging the Gap Between Model Merging and Multi-Task Learning with Deep Representation Surgery

Xingwei Wang

Jie Zhang

280

18 Oct 2024

Audio Explanation Synthesis with Generative Foundation ModelsIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2024

Alican Akman

Qiyang Sun

Björn W. Schuller

299

10 Oct 2024

A Joint Spectro-Temporal Relational Thinking Based Acoustic Modeling Framework

180

17 Sep 2024

Temporal-Channel Modeling in Multi-head Self-Attention for Synthetic Speech Detection

Kong Aik Lee

Eng Siong Chng

301

25 Jun 2024

mHuBERT-147: A Compact Multilingual HuBERT Model

574

10 Jun 2024

A Dataset and Baselines for Measuring and Predicting the Music Piece Memorability

191

21 May 2024

LLAniMAtion: LLAMA Driven Gesture Animation

John T. Windle

Iain Matthews

Sarah Taylor

290

13 May 2024

A Large-Scale Evaluation of Speech Foundation Models

...

Shinji Watanabe

Hung-yi Lee

319

15 Apr 2024

BRAVEn: Improving Self-Supervised Pre-training for Visual and Auditory Speech RecognitionIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2024

339

02 Apr 2024

SKILL: Similarity-aware Knowledge distILLation for Speech
Self-Supervised Learning

Luca Zampierin

G. B. Hacene

Bac Nguyen

Mirco Ravanelli

325

26 Feb 2024

AnnoTheia: A Semi-Automatic Annotation Toolkit for Audio-Visual Speech Technologies

José-M. Acosta-Triana

David Gimeno-Gómez

Carlos David Martínez Hinarejos

VLM VGen

328

20 Feb 2024

On the Transferability of Large-Scale Self-Supervision to Few-Shot Audio Classification

Calum Heggan

S. Budgett

Timothy M. Hospedales

Mehrdad Yaghoobi

SSL

356

02 Feb 2024

Reading Between the Frames: Multi-Modal Depression Detection in Videos from Non-Verbal Cues

David Gimeno-Gómez

Ana-Maria Bucur

Adrian Cosma

Carlos David Martínez Hinarejos

Paolo Rosso

259

05 Jan 2024

FAT-HuBERT: Front-end Adaptive Training of Hidden-unit BERT for Distortion-Invariant Robust Speech RecognitionAutomatic Speech Recognition & Understanding (ASRU), 2023

Dongning Yang

Wei Wang

Yanmin Qian

353

29 Nov 2023

A Quantitative Approach to Understand Self-Supervised Models as Cross-lingual Feature ExtractorsInternational Conference on Natural Language and Speech Processing (ICNLSP), 2023

Xiangyu Zhang

222

27 Nov 2023

Multi-objective Non-intrusive Hearing-aid Speech Assessment Model

Yu Tsao

275

15 Nov 2023

Emphasized Non-Target Speaker Knowledge in Knowledge Distillation for Automatic Speaker VerificationIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023

Kong Aik Lee

267

26 Sep 2023

LeBenchmark 2.0: a Standardized, Replicable and Enhanced Framework for Self-supervised Representations of French SpeechComputer Speech and Language (CSL), 2023

...

301

11 Sep 2023

The Quest of Finding the Antidote to Sparse Double Descent

Victor Quétu

Marta Milovanović

353

31 Aug 2023

Rep2wav: Noise Robust text-to-speech Using self-supervised representations

Rilin Chen

Yuchen Hu

264

28 Aug 2023

Speech Self-Supervised Representations Benchmarking: a Case for Larger Probing HeadsComputer Speech and Language (CSL), 2023

Mirco Ravanelli

277

28 Aug 2023

Lip2Vec: Efficient and Robust Visual Speech Recognition via Latent-to-Latent Visual to Audio Representation MappingIEEE International Conference on Computer Vision (ICCV), 2023

Y. A. D. Djilali

Sanath Narayan

Haithem Boussaid

Ebtesam Almazrouei

Merouane Debbah

245

11 Aug 2023

Representation Learning With Hidden Unit Clustering For Low Resource Speech ApplicationsIEEE/ACM Transactions on Audio Speech and Language Processing (TASLP), 2023

Varun Krishna

T. Sai

Sriram Ganapathy

SSL

189

14 Jul 2023

On the Effectiveness of Speech Self-supervised Learning for MusicInternational Society for Music Information Retrieval Conference (ISMIR), 2023

Ge Zhang

...

Ruibo Liu

Gus Xia

Roger Dannenberg

Yi-Ting Guo

Jie Fu

194

11 Jul 2023

Factorised Speaker-environment Adaptive Training of Conformer Speech Recognition SystemsInterspeech (Interspeech), 2023

Jiajun Deng

Guinan Li

Xurong Xie

Zengrui Jin

Mingyu Cui

Tianzi Wang

Shujie Hu

Mengzhe Geng

Xunying Liu

BDL

253

26 Jun 2023

Feature Normalization for Fine-tuning Self-Supervised Models in Speech EnhancementInterspeech (Interspeech), 2023

Hejung Yang

Hong-Goo Kang

SSL

221

14 Jun 2023

Automatic Data Augmentation for Domain Adapted Fine-Tuning of Self-Supervised Speech RepresentationsInterspeech (Interspeech), 2023

Salah Zaiem

Titouan Parcollet

S. Essid

225

01 Jun 2023

How to Estimate Model Transferability of Pre-Trained Speech Models?Interspeech (Interspeech), 2023

Chao-Han Huck Yang

503

01 Jun 2023

MT-SLVR: Multi-Task Self-Supervised Learning for Transformation In(Variant) RepresentationsInterspeech (Interspeech), 2023

Calum Heggan

Timothy M. Hospedales

S. Budgett

Mehrdad Yaghoobi

SSL

383

29 May 2023

Weakly-Supervised Speech Pre-training: A Case Study on Target Speech RecognitionInterspeech (Interspeech), 2023

Wangyou Zhang

Y. Qian

286

25 May 2023

On the Efficacy and Noise-Robustness of Jointly Learned Speech Emotion and Automatic Speech RecognitionInterspeech (Interspeech), 2023

268

21 May 2023

Self-supervised Neural Factor Analysis for Disentangling Utterance-level Speech RepresentationsInternational Conference on Machine Learning (ICML), 2023

221

14 May 2023

Continual Learning of Hand Gestures for Human-Robot Interaction

Xavier Cucurull

A. Garrell

175

13 Apr 2023

Looking Similar, Sounding Different: Leveraging Counterfactual
Cross-Modal Pairs for Audiovisual Representation Learning

447

12 Apr 2023

Nonlinear Independent Component Analysis for Principled Disentanglement in Unsupervised Deep LearningPatterns (Patterns), 2023

372

29 Mar 2023

Evaluating gesture generation in a large-scale open challenge: The GENEA Challenge 2022ACM Transactions on Graphics (TOG), 2023

219

15 Mar 2023

Fine-tuning Strategies for Faster Inference using Speech Self-Supervised Models: A Comparative Study

Mirco Ravanelli

325

12 Mar 2023

Multi-Task Self-Supervised Time-Series Representation LearningInformation Sciences (Inf. Sci.), 2023

Heejeong Choi

Pilsung Kang

AI4TS SSL

311

02 Mar 2023

Can we avoid Double Descent in Deep Neural Networks?International Conference on Information Photonics (ICIP), 2023

Victor Quétu

Enzo Tartaglione

AI4CE

340

26 Feb 2023

Jointly Learning Visual and Auditory Speech Representations from Raw DataInternational Conference on Learning Representations (ICLR), 2022

331

12 Dec 2022

An Overview of Indian Spoken Language Recognition from Machine Learning Perspective

Spandan Dey

Md. Sahidullah

G. Saha

230

30 Nov 2022

MT4SSL: Boosting Self-Supervised Speech Representation Learning by Integrating Multiple TargetsInterspeech (Interspeech), 2022

Xie Chen

342

14 Nov 2022

Biased Self-supervised learning for ASRInterspeech (Interspeech), 2022

216

04 Nov 2022

Losses Can Be Blessings: Routing Self-Supervised Speech Representations Towards Efficient Multilingual and Multitask Speech ProcessingNeural Information Processing Systems (NeurIPS), 2022

Kaizhi Qian

460

02 Nov 2022

Improved acoustic-to-articulatory inversion using representations from pretrained self-supervised learning modelsIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2022

Sathvik Udupa

Siddarth C

P. Ghosh

231

30 Oct 2022

Robust Data2vec: Noise-robust Speech Representation Learning for ASR by Combining Regression and Improved Contrastive LearningIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2022

Yu-Chen Hu

214

27 Oct 2022