ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2001.09239
  4. Cited By
Multi-task self-supervised learning for Robust Speech Recognition
v1v2 (latest)

Multi-task self-supervised learning for Robust Speech Recognition

IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2020
25 January 2020
Mirco Ravanelli
Jianyuan Zhong
Santiago Pascual
P. Swietojanski
João Monteiro
J. Trmal
Yoshua Bengio
    SSL
ArXiv (abs)PDFHTML

Papers citing "Multi-task self-supervised learning for Robust Speech Recognition"

50 / 167 papers shown
Noisy Disentanglement with Tri-stage Training for Noise-Robust Speech Recognition
Noisy Disentanglement with Tri-stage Training for Noise-Robust Speech Recognition
Shuangyuan Chen
Shuang Wei
Dongxing Xu
Yanhua Long
197
0
0
01 Sep 2025
Model Unmerging: Making Your Models Unmergeable for Secure Model Sharing
Model Unmerging: Making Your Models Unmergeable for Secure Model Sharing
Zihao Wang
Enneng Yang
L. Yin
Shiwei Liu
Li Shen
FedMLMoMe
197
1
0
01 Sep 2025
Audio-Visual Representation Learning via Knowledge Distillation from Speech Foundation Models
Audio-Visual Representation Learning via Knowledge Distillation from Speech Foundation ModelsPattern Recognition (Pattern Recogn.), 2025
Jing-Xuan Zhang
Genshun Wan
Jianqing Gao
Zhen-Hua Ling
349
13
0
09 Feb 2025
LLM supervised Pre-training for Multimodal Emotion Recognition in Conversations
LLM supervised Pre-training for Multimodal Emotion Recognition in ConversationsIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2025
Soumya Dutta
Sriram Ganapathy
372
21
0
20 Jan 2025
SurgeryV2: Bridging the Gap Between Model Merging and Multi-Task
  Learning with Deep Representation Surgery
SurgeryV2: Bridging the Gap Between Model Merging and Multi-Task Learning with Deep Representation Surgery
Enneng Yang
Li Shen
Zhenyi Wang
G. Guo
Xingwei Wang
Xiaocun Cao
Jie Zhang
Dacheng Tao
MoMe
280
11
0
18 Oct 2024
Audio Explanation Synthesis with Generative Foundation Models
Audio Explanation Synthesis with Generative Foundation ModelsIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2024
Alican Akman
Qiyang Sun
Björn W. Schuller
299
2
0
10 Oct 2024
A Joint Spectro-Temporal Relational Thinking Based Acoustic Modeling
  Framework
A Joint Spectro-Temporal Relational Thinking Based Acoustic Modeling Framework
Zheng Nan
T. Dang
V. Sethu
Beena Ahmed
180
0
0
17 Sep 2024
Temporal-Channel Modeling in Multi-head Self-Attention for Synthetic
  Speech Detection
Temporal-Channel Modeling in Multi-head Self-Attention for Synthetic Speech Detection
Duc-Tuan Truong
Ruijie Tao
Tuan Nguyen
Hieu-Thi Luong
Kong Aik Lee
Eng Siong Chng
301
44
0
25 Jun 2024
mHuBERT-147: A Compact Multilingual HuBERT Model
mHuBERT-147: A Compact Multilingual HuBERT Model
Marcely Zanon Boito
Vivek Iyer
Nikolaos Lagos
Laurent Besacier
Ioan Calapodescu
VLM
574
70
0
10 Jun 2024
A Dataset and Baselines for Measuring and Predicting the Music Piece
  Memorability
A Dataset and Baselines for Measuring and Predicting the Music Piece Memorability
Li-Yang Tseng
Tzu-Ling Lin
Hong-Han Shuai
Jen-Wei Huang
Wen-Whei Chang
191
1
0
21 May 2024
LLAniMAtion: LLAMA Driven Gesture Animation
LLAniMAtion: LLAMA Driven Gesture Animation
John T. Windle
Iain Matthews
Sarah Taylor
290
1
0
13 May 2024
A Large-Scale Evaluation of Speech Foundation Models
A Large-Scale Evaluation of Speech Foundation Models
Shu-Wen Yang
Heng-Jui Chang
Zili Huang
Andy T. Liu
Cheng-I Jeff Lai
...
Kushal Lakhotia
Shang-Wen Li
Abdelrahman Mohamed
Shinji Watanabe
Hung-yi Lee
319
62
0
15 Apr 2024
BRAVEn: Improving Self-Supervised Pre-training for Visual and Auditory
  Speech Recognition
BRAVEn: Improving Self-Supervised Pre-training for Visual and Auditory Speech RecognitionIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2024
A. Haliassos
Andreas Zinonos
Rodrigo Mira
Stavros Petridis
Maja Pantic
VLMSSLAI4TS
339
26
0
02 Apr 2024
SKILL: Similarity-aware Knowledge distILLation for Speech
  Self-Supervised Learning
SKILL: Similarity-aware Knowledge distILLation for Speech Self-Supervised Learning
Luca Zampierin
G. B. Hacene
Bac Nguyen
Mirco Ravanelli
325
4
0
26 Feb 2024
AnnoTheia: A Semi-Automatic Annotation Toolkit for Audio-Visual Speech
  Technologies
AnnoTheia: A Semi-Automatic Annotation Toolkit for Audio-Visual Speech Technologies
José-M. Acosta-Triana
David Gimeno-Gómez
Carlos David Martínez Hinarejos
VLMVGen
328
4
0
20 Feb 2024
On the Transferability of Large-Scale Self-Supervision to Few-Shot Audio
  Classification
On the Transferability of Large-Scale Self-Supervision to Few-Shot Audio Classification
Calum Heggan
S. Budgett
Timothy M. Hospedales
Mehrdad Yaghoobi
SSL
356
3
0
02 Feb 2024
Reading Between the Frames: Multi-Modal Depression Detection in Videos
  from Non-Verbal Cues
Reading Between the Frames: Multi-Modal Depression Detection in Videos from Non-Verbal Cues
David Gimeno-Gómez
Ana-Maria Bucur
Adrian Cosma
Carlos David Martínez Hinarejos
Paolo Rosso
259
28
0
05 Jan 2024
FAT-HuBERT: Front-end Adaptive Training of Hidden-unit BERT for
  Distortion-Invariant Robust Speech Recognition
FAT-HuBERT: Front-end Adaptive Training of Hidden-unit BERT for Distortion-Invariant Robust Speech RecognitionAutomatic Speech Recognition & Understanding (ASRU), 2023
Dongning Yang
Wei Wang
Yanmin Qian
353
7
0
29 Nov 2023
A Quantitative Approach to Understand Self-Supervised Models as
  Cross-lingual Feature Extractors
A Quantitative Approach to Understand Self-Supervised Models as Cross-lingual Feature ExtractorsInternational Conference on Natural Language and Speech Processing (ICNLSP), 2023
Shuyue Stella Li
Beining Xu
Xiangyu Zhang
Hexin Liu
Wen-Han Chao
Leibny Paola García
SSL
222
5
0
27 Nov 2023
Multi-objective Non-intrusive Hearing-aid Speech Assessment Model
Multi-objective Non-intrusive Hearing-aid Speech Assessment Model
Hsin-Tien Chiang
Szu-Wei Fu
Hsin-Min Wang
Yu Tsao
John H. L. Hansen
275
8
0
15 Nov 2023
Emphasized Non-Target Speaker Knowledge in Knowledge Distillation for
  Automatic Speaker Verification
Emphasized Non-Target Speaker Knowledge in Knowledge Distillation for Automatic Speaker VerificationIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023
Duc-Tuan Truong
Ruijie Tao
J. Yip
Kong Aik Lee
Chng Eng Siong
267
13
0
26 Sep 2023
LeBenchmark 2.0: a Standardized, Replicable and Enhanced Framework for
  Self-supervised Representations of French Speech
LeBenchmark 2.0: a Standardized, Replicable and Enhanced Framework for Self-supervised Representations of French SpeechComputer Speech and Language (CSL), 2023
Titouan Parcollet
H. Nguyen
Solène Evain
Marcely Zanon Boito
Adrien Pupier
...
François Portet
Solange Rossato
Fabien Ringeval
D. Schwab
Laurent Besacier
301
31
0
11 Sep 2023
The Quest of Finding the Antidote to Sparse Double Descent
The Quest of Finding the Antidote to Sparse Double Descent
Victor Quétu
Marta Milovanović
353
0
0
31 Aug 2023
Rep2wav: Noise Robust text-to-speech Using self-supervised
  representations
Rep2wav: Noise Robust text-to-speech Using self-supervised representations
Qiu-shi Zhu
Yunting Gu
Rilin Chen
Chao Weng
Yuchen Hu
Lirong Dai
Jie Zhang
AI4TS
264
3
0
28 Aug 2023
Speech Self-Supervised Representations Benchmarking: a Case for Larger
  Probing Heads
Speech Self-Supervised Representations Benchmarking: a Case for Larger Probing HeadsComputer Speech and Language (CSL), 2023
Salah Zaiem
Youcef Kemiche
Titouan Parcollet
S. Essid
Mirco Ravanelli
SSL
277
20
0
28 Aug 2023
Lip2Vec: Efficient and Robust Visual Speech Recognition via
  Latent-to-Latent Visual to Audio Representation Mapping
Lip2Vec: Efficient and Robust Visual Speech Recognition via Latent-to-Latent Visual to Audio Representation MappingIEEE International Conference on Computer Vision (ICCV), 2023
Y. A. D. Djilali
Sanath Narayan
Haithem Boussaid
Ebtesam Almazrouei
Merouane Debbah
245
16
0
11 Aug 2023
Representation Learning With Hidden Unit Clustering For Low Resource
  Speech Applications
Representation Learning With Hidden Unit Clustering For Low Resource Speech ApplicationsIEEE/ACM Transactions on Audio Speech and Language Processing (TASLP), 2023
Varun Krishna
T. Sai
Sriram Ganapathy
SSL
189
3
0
14 Jul 2023
On the Effectiveness of Speech Self-supervised Learning for Music
On the Effectiveness of Speech Self-supervised Learning for MusicInternational Society for Music Information Retrieval Conference (ISMIR), 2023
Yi Ma
Ruibin Yuan
Yi Zhou
Ge Zhang
Xingran Chen
...
Ruibo Liu
Gus Xia
Roger Dannenberg
Yi-Ting Guo
Jie Fu
194
14
0
11 Jul 2023
Factorised Speaker-environment Adaptive Training of Conformer Speech
  Recognition Systems
Factorised Speaker-environment Adaptive Training of Conformer Speech Recognition SystemsInterspeech (Interspeech), 2023
Jiajun Deng
Guinan Li
Xurong Xie
Zengrui Jin
Mingyu Cui
Tianzi Wang
Shujie Hu
Mengzhe Geng
Xunying Liu
BDL
253
2
0
26 Jun 2023
Feature Normalization for Fine-tuning Self-Supervised Models in Speech
  Enhancement
Feature Normalization for Fine-tuning Self-Supervised Models in Speech EnhancementInterspeech (Interspeech), 2023
Hejung Yang
Hong-Goo Kang
SSL
221
1
0
14 Jun 2023
Automatic Data Augmentation for Domain Adapted Fine-Tuning of
  Self-Supervised Speech Representations
Automatic Data Augmentation for Domain Adapted Fine-Tuning of Self-Supervised Speech RepresentationsInterspeech (Interspeech), 2023
Salah Zaiem
Titouan Parcollet
S. Essid
225
2
0
01 Jun 2023
How to Estimate Model Transferability of Pre-Trained Speech Models?
How to Estimate Model Transferability of Pre-Trained Speech Models?Interspeech (Interspeech), 2023
Zih-Ching Chen
Chao-Han Huck Yang
Yue Liu
Yu Zhang
Nanxin Chen
Shoufeng Chang
Rohit Prabhavalkar
Hung-yi Lee
Tara N. Sainath
503
11
0
01 Jun 2023
MT-SLVR: Multi-Task Self-Supervised Learning for Transformation
  In(Variant) Representations
MT-SLVR: Multi-Task Self-Supervised Learning for Transformation In(Variant) RepresentationsInterspeech (Interspeech), 2023
Calum Heggan
Timothy M. Hospedales
S. Budgett
Mehrdad Yaghoobi
SSL
383
7
0
29 May 2023
Weakly-Supervised Speech Pre-training: A Case Study on Target Speech
  Recognition
Weakly-Supervised Speech Pre-training: A Case Study on Target Speech RecognitionInterspeech (Interspeech), 2023
Wangyou Zhang
Y. Qian
286
12
0
25 May 2023
On the Efficacy and Noise-Robustness of Jointly Learned Speech Emotion
  and Automatic Speech Recognition
On the Efficacy and Noise-Robustness of Jointly Learned Speech Emotion and Automatic Speech RecognitionInterspeech (Interspeech), 2023
L. Bansal
S. P. Dubagunta
Malolan Chetlur
Pushpak Jagtap
A. Ganapathiraju
268
1
0
21 May 2023
Self-supervised Neural Factor Analysis for Disentangling Utterance-level
  Speech Representations
Self-supervised Neural Factor Analysis for Disentangling Utterance-level Speech RepresentationsInternational Conference on Machine Learning (ICML), 2023
Wei-wei Lin
Chenhang He
Man-Wai Mak
Youzhi Tu
221
6
0
14 May 2023
Continual Learning of Hand Gestures for Human-Robot Interaction
Continual Learning of Hand Gestures for Human-Robot Interaction
Xavier Cucurull
A. Garrell
175
3
0
13 Apr 2023
Looking Similar, Sounding Different: Leveraging Counterfactual
  Cross-Modal Pairs for Audiovisual Representation Learning
Looking Similar, Sounding Different: Leveraging Counterfactual Cross-Modal Pairs for Audiovisual Representation LearningComputer Vision and Pattern Recognition (CVPR), 2023
Nikhil Singh
Chih-Wei Wu
Iroro Orife
Mahdi M. Kalayeh
447
3
0
12 Apr 2023
Nonlinear Independent Component Analysis for Principled Disentanglement
  in Unsupervised Deep Learning
Nonlinear Independent Component Analysis for Principled Disentanglement in Unsupervised Deep LearningPatterns (Patterns), 2023
Aapo Hyvarinen
Ilyes Khemakhem
H. Morioka
CMLOOD
372
63
0
29 Mar 2023
Evaluating gesture generation in a large-scale open challenge: The GENEA
  Challenge 2022
Evaluating gesture generation in a large-scale open challenge: The GENEA Challenge 2022ACM Transactions on Graphics (TOG), 2023
Taras Kucherenko
Pieter Wolfert
Youngwoo Yoon
Carla Viegas
Teodor Nikolov
Mihail Tsakov
G. Henter
219
36
0
15 Mar 2023
Fine-tuning Strategies for Faster Inference using Speech Self-Supervised
  Models: A Comparative Study
Fine-tuning Strategies for Faster Inference using Speech Self-Supervised Models: A Comparative Study
Salah Zaiem
Robin Algayres
Titouan Parcollet
S. Essid
Mirco Ravanelli
325
20
0
12 Mar 2023
Multi-Task Self-Supervised Time-Series Representation Learning
Multi-Task Self-Supervised Time-Series Representation LearningInformation Sciences (Inf. Sci.), 2023
Heejeong Choi
Pilsung Kang
AI4TSSSL
311
21
0
02 Mar 2023
Can we avoid Double Descent in Deep Neural Networks?
Can we avoid Double Descent in Deep Neural Networks?International Conference on Information Photonics (ICIP), 2023
Victor Quétu
Enzo Tartaglione
AI4CE
340
2
0
26 Feb 2023
Jointly Learning Visual and Auditory Speech Representations from Raw
  Data
Jointly Learning Visual and Auditory Speech Representations from Raw DataInternational Conference on Learning Representations (ICLR), 2022
A. Haliassos
Pingchuan Ma
Rodrigo Mira
Stavros Petridis
Maja Pantic
SSL
331
73
0
12 Dec 2022
An Overview of Indian Spoken Language Recognition from Machine Learning
  Perspective
An Overview of Indian Spoken Language Recognition from Machine Learning Perspective
Spandan Dey
Md. Sahidullah
G. Saha
230
33
0
30 Nov 2022
MT4SSL: Boosting Self-Supervised Speech Representation Learning by
  Integrating Multiple Targets
MT4SSL: Boosting Self-Supervised Speech Representation Learning by Integrating Multiple TargetsInterspeech (Interspeech), 2022
Ziyang Ma
Zhisheng Zheng
Changli Tang
Yujin Wang
Xie Chen
342
21
0
14 Nov 2022
Biased Self-supervised learning for ASR
Biased Self-supervised learning for ASRInterspeech (Interspeech), 2022
Florian Kreyssig
Yangyang Shi
Jinxi Guo
Leda Sari
Abdel-rahman Mohamed
P. Woodland
SSL
216
4
0
04 Nov 2022
Losses Can Be Blessings: Routing Self-Supervised Speech Representations Towards Efficient Multilingual and Multitask Speech Processing
Losses Can Be Blessings: Routing Self-Supervised Speech Representations Towards Efficient Multilingual and Multitask Speech ProcessingNeural Information Processing Systems (NeurIPS), 2022
Yonggan Fu
Yang Zhang
Kaizhi Qian
Zhifan Ye
Zhongzhi Yu
Cheng-I Jeff Lai
Yingyan Lin
460
10
0
02 Nov 2022
Improved acoustic-to-articulatory inversion using representations from
  pretrained self-supervised learning models
Improved acoustic-to-articulatory inversion using representations from pretrained self-supervised learning modelsIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2022
Sathvik Udupa
Siddarth C
P. Ghosh
231
11
0
30 Oct 2022
Robust Data2vec: Noise-robust Speech Representation Learning for ASR by
  Combining Regression and Improved Contrastive Learning
Robust Data2vec: Noise-robust Speech Representation Learning for ASR by Combining Regression and Improved Contrastive LearningIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2022
Qiu-shi Zhu
Long Zhou
Jie Zhang
Shujie Liu
Yu-Chen Hu
Lirong Dai
VLMSSL
214
44
0
27 Oct 2022
1234
Next
Page 1 of 4