ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1904.03416
  4. Cited By
Learning Problem-agnostic Speech Representations from Multiple
  Self-supervised Tasks

Learning Problem-agnostic Speech Representations from Multiple Self-supervised Tasks

6 April 2019
Santiago Pascual
Mirco Ravanelli
Joan Serrà
Antonio Bonafonte
Yoshua Bengio
    SSL
ArXiv (abs)PDFHTML

Papers citing "Learning Problem-agnostic Speech Representations from Multiple Self-supervised Tasks"

50 / 147 papers shown
Title
COCOLA: Coherence-Oriented Contrastive Learning of Musical Audio Representations
COCOLA: Coherence-Oriented Contrastive Learning of Musical Audio RepresentationsIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2024
Ruben Ciranni
Emilian Postolache
Giorgio Mariani
Michele Mancusi
Giorgio Fabbro
Emanuele Rodolà
Luca Cosmo
634
16
0
10 Jan 2025
AfriHuBERT: A self-supervised speech representation model for African languages
AfriHuBERT: A self-supervised speech representation model for African languages
Jesujoba Oluwadara Alabi
Xuechen Liu
Dietrich Klakow
Junichi Yamagishi
VLM
427
11
0
30 Sep 2024
Towards the Next Frontier in Speech Representation Learning Using Disentanglement
Towards the Next Frontier in Speech Representation Learning Using Disentanglement
Varun Krishna
Sriram Ganapathy
SSL
258
2
0
02 Jul 2024
mHuBERT-147: A Compact Multilingual HuBERT Model
mHuBERT-147: A Compact Multilingual HuBERT Model
Marcely Zanon Boito
Vivek Iyer
Nikolaos Lagos
Laurent Besacier
Ioan Calapodescu
VLM
422
57
0
10 Jun 2024
A Dataset and Baselines for Measuring and Predicting the Music Piece
  Memorability
A Dataset and Baselines for Measuring and Predicting the Music Piece Memorability
Li-Yang Tseng
Tzu-Ling Lin
Hong-Han Shuai
Jen-Wei Huang
Wen-Whei Chang
133
1
0
21 May 2024
Benchmarking Representations for Speech, Music, and Acoustic Events
Benchmarking Representations for Speech, Music, and Acoustic Events
Moreno La Quatra
Alkis Koudounas
Lorenzo Vaiani
Elena Baralis
Luca Cagliero
Paolo Garza
Sabato Marco Siniscalchi
163
28
0
02 May 2024
LongVQ: Long Sequence Modeling with Vector Quantization on Structured
  Memory
LongVQ: Long Sequence Modeling with Vector Quantization on Structured Memory
Zicheng Liu
Li Wang
Siyuan Li
Zedong Wang
Haitao Lin
Stan Z. Li
VLM
227
5
0
17 Apr 2024
Multi-Stage Multi-Modal Pre-Training for Automatic Speech Recognition
Multi-Stage Multi-Modal Pre-Training for Automatic Speech Recognition
Yash Jain
David M. Chan
Pranav Dheram
Aparna Khare
Olabanji Shonibare
Venkatesh Ravichandran
Shalini Ghosh
222
2
0
28 Mar 2024
Exploring the Task-agnostic Trait of Self-supervised Learning in the
  Context of Detecting Mental Disorders
Exploring the Task-agnostic Trait of Self-supervised Learning in the Context of Detecting Mental Disorders
Rohan kumar Gupta
Rohit Sinha
237
0
0
22 Mar 2024
Reading Between the Frames: Multi-Modal Depression Detection in Videos
  from Non-Verbal Cues
Reading Between the Frames: Multi-Modal Depression Detection in Videos from Non-Verbal Cues
David Gimeno-Gómez
Ana-Maria Bucur
Adrian Cosma
Carlos David Martínez Hinarejos
Paolo Rosso
207
24
0
05 Jan 2024
CORN: Co-Trained Full- And No-Reference Speech Quality Assessment
CORN: Co-Trained Full- And No-Reference Speech Quality Assessment
Pranay Manocha
Donald Williamson
Adam Finkelstein
251
3
0
13 Oct 2023
Self-Supervised Learning for Audio-Based Emotion Recognition
Self-Supervised Learning for Audio-Based Emotion Recognition
Peranut Nimitsurachat
Peter Washington
194
3
0
23 Jul 2023
Large-scale unsupervised audio pre-training for video-to-speech
  synthesis
Large-scale unsupervised audio pre-training for video-to-speech synthesisIEEE/ACM Transactions on Audio Speech and Language Processing (TASLP), 2023
Triantafyllos Kefalas
Yannis Panagakis
Maja Pantic
VGen
215
5
0
27 Jun 2023
Self-supervised Audio Teacher-Student Transformer for Both Clip-level
  and Frame-level Tasks
Self-supervised Audio Teacher-Student Transformer for Both Clip-level and Frame-level TasksIEEE/ACM Transactions on Audio Speech and Language Processing (TASLP), 2023
Xian Li
Nian Shao
Xiaofei Li
ViTCLIP
348
44
0
07 Jun 2023
Improved Cross-Lingual Transfer Learning For Automatic Speech
  Translation
Improved Cross-Lingual Transfer Learning For Automatic Speech Translation
Sameer Khurana
Nauman Dawalatabad
Antoine Laurent
Luis Vicente
Pablo Gimeno
Victoria Mingote
James R. Glass
VLM
335
1
0
01 Jun 2023
MiniSUPERB: Lightweight Benchmark for Self-supervised Speech Models
MiniSUPERB: Lightweight Benchmark for Self-supervised Speech ModelsAutomatic Speech Recognition & Understanding (ASRU), 2023
Yu-Hsiang Wang
Huan Chen
Kai-Wei Chang
Winston H. Hsu
Hung-yi Lee
418
7
0
30 May 2023
Self-supervised Neural Factor Analysis for Disentangling Utterance-level
  Speech Representations
Self-supervised Neural Factor Analysis for Disentangling Utterance-level Speech RepresentationsInternational Conference on Machine Learning (ICML), 2023
Wei-wei Lin
Chenhang He
Man-Wai Mak
Youzhi Tu
165
6
0
14 May 2023
A Complete Survey on Generative AI (AIGC): Is ChatGPT from GPT-4 to
  GPT-5 All You Need?
A Complete Survey on Generative AI (AIGC): Is ChatGPT from GPT-4 to GPT-5 All You Need?
Chaoning Zhang
Chenshuang Zhang
Sheng Zheng
Yu Qiao
Chenghao Li
...
Lik-Hang Lee
Yang Yang
Heng Tao Shen
In So Kweon
Choong Seon Hong
303
199
0
21 Mar 2023
Progressive Multi-Scale Self-Supervised Learning for Speech Recognition
Progressive Multi-Scale Self-Supervised Learning for Speech RecognitionAsia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), 2022
Genshun Wan
Tan Liu
Hang Chen
Jia Pan
Cong Liu
Z. Ye
SSL
137
0
0
07 Dec 2022
MT4SSL: Boosting Self-Supervised Speech Representation Learning by
  Integrating Multiple Targets
MT4SSL: Boosting Self-Supervised Speech Representation Learning by Integrating Multiple TargetsInterspeech (Interspeech), 2022
Ziyang Ma
Zhisheng Zheng
Changli Tang
Yujin Wang
Xie Chen
302
21
0
14 Nov 2022
Biased Self-supervised learning for ASR
Biased Self-supervised learning for ASRInterspeech (Interspeech), 2022
Florian Kreyssig
Yangyang Shi
Jinxi Guo
Leda Sari
Abdel-rahman Mohamed
P. Woodland
SSL
147
4
0
04 Nov 2022
Losses Can Be Blessings: Routing Self-Supervised Speech Representations Towards Efficient Multilingual and Multitask Speech Processing
Losses Can Be Blessings: Routing Self-Supervised Speech Representations Towards Efficient Multilingual and Multitask Speech ProcessingNeural Information Processing Systems (NeurIPS), 2022
Yonggan Fu
Yang Zhang
Kaizhi Qian
Zhifan Ye
Zhongzhi Yu
Cheng-I Jeff Lai
Yingyan Lin
332
10
0
02 Nov 2022
Understanding Acoustic Patterns of Human Teachers Demonstrating
  Manipulation Tasks to Robots
Understanding Acoustic Patterns of Human Teachers Demonstrating Manipulation Tasks to RobotsIEEE/RJS International Conference on Intelligent RObots and Systems (IROS), 2022
Akanksha Saran
K. Desai
M. L. Chang
Rudolf Lioutikov
A. Thomaz
S. Niekum
138
4
0
01 Nov 2022
On the Role of Visual Context in Enriching Music Representations
On the Role of Visual Context in Enriching Music RepresentationsIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2022
Kleanthis Avramidis
Shanti Stewart
Shrikanth Narayanan
155
4
0
28 Oct 2022
SUPERB @ SLT 2022: Challenge on Generalization and Efficiency of
  Self-Supervised Speech Representation Learning
SUPERB @ SLT 2022: Challenge on Generalization and Efficiency of Self-Supervised Speech Representation LearningSpoken Language Technology Workshop (SLT), 2022
Tzu-hsun Feng
Annie Dong
Ching-Feng Yeh
Shu-Wen Yang
Tzu-Quan Lin
...
Xuankai Chang
Shinji Watanabe
Abdel-rahman Mohamed
Shang-Wen Li
Hung-yi Lee
ELMSSL
236
38
0
16 Oct 2022
Individualized Conditioning and Negative Distances for Speaker
  Separation
Individualized Conditioning and Negative Distances for Speaker SeparationInternational Conference on Machine Learning and Applications (ICMLA), 2022
Tao Sun
Nidal Abuhajar
Shuyu Gong
Zhewei Wang
Charles D. Smith
Xianhui Wang
Li Xu
Jundong Liu
VLM
151
1
0
12 Oct 2022
The Efficacy of Self-Supervised Speech Models for Audio Representations
The Efficacy of Self-Supervised Speech Models for Audio Representations
Tung-Yu Wu
Chen-An Li
Tzu-Han Lin
Tsung-Yuan Hsu
Hung-yi Lee
231
6
0
26 Sep 2022
Transfer Learning of wav2vec 2.0 for Automatic Lyric Transcription
Transfer Learning of wav2vec 2.0 for Automatic Lyric TranscriptionInternational Society for Music Information Retrieval Conference (ISMIR), 2022
Longshen Ou
Xiangming Gu
Ye Wang
165
24
0
20 Jul 2022
Deep versus Wide: An Analysis of Student Architectures for Task-Agnostic
  Knowledge Distillation of Self-Supervised Speech Models
Deep versus Wide: An Analysis of Student Architectures for Task-Agnostic Knowledge Distillation of Self-Supervised Speech ModelsInterspeech (Interspeech), 2022
Takanori Ashihara
Takafumi Moriya
Kohei Matsuura
Tomohiro Tanaka
163
34
0
14 Jul 2022
Towards Proper Contrastive Self-supervised Learning Strategies For Music
  Audio Representation
Towards Proper Contrastive Self-supervised Learning Strategies For Music Audio RepresentationIEEE International Conference on Multimedia and Expo (ICME), 2022
Jeong-Eun Choi
Seongwon Jang
Hyunsouk Cho
Sehee Chung
SSL
151
11
0
10 Jul 2022
Glow-WaveGAN 2: High-quality Zero-shot Text-to-speech Synthesis and
  Any-to-any Voice Conversion
Glow-WaveGAN 2: High-quality Zero-shot Text-to-speech Synthesis and Any-to-any Voice ConversionInterspeech (Interspeech), 2022
Yinjiao Lei
Shan Yang
Jian Cong
Linfu Xie
Jane Polak Scowcroft
DiffM
178
14
0
05 Jul 2022
Wav2Vec-Aug: Improved self-supervised training with limited data
Wav2Vec-Aug: Improved self-supervised training with limited dataInterspeech (Interspeech), 2022
Anuroop Sriram
Michael Auli
Alexei Baevski
SSLVLM
169
16
0
27 Jun 2022
Interpretable Acoustic Representation Learning on Breathing and Speech
  Signals for COVID-19 Detection
Interpretable Acoustic Representation Learning on Breathing and Speech Signals for COVID-19 DetectionInterspeech (Interspeech), 2022
Debottam Dutta
Debarpan Bhattacharya
Sriram Ganapathy
A. H. Poorjam
Deepak Mittal
M. Singh
87
1
0
27 Jun 2022
Boosting Cross-Domain Speech Recognition with Self-Supervision
Boosting Cross-Domain Speech Recognition with Self-SupervisionIEEE/ACM Transactions on Audio Speech and Language Processing (TASLP), 2022
Hanjing Zhu
Gaofeng Cheng
Yongfeng Zhang
Wenxin Hou
Pengyuan Zhang
Yonghong Yan
342
22
0
20 Jun 2022
Beyond Just Vision: A Review on Self-Supervised Representation Learning
  on Multimodal and Temporal Data
Beyond Just Vision: A Review on Self-Supervised Representation Learning on Multimodal and Temporal Data
Shohreh Deldari
Hao Xue
Aaqib Saeed
Jiayuan He
Daniel V. Smith
Flora D. Salim
AI4TS
227
43
0
06 Jun 2022
A Multimodal Corpus for Emotion Recognition in Sarcasm
A Multimodal Corpus for Emotion Recognition in SarcasmInternational Conference on Language Resources and Evaluation (LREC), 2022
Anupama Ray
Shubham Mishra
Apoorva Nunna
P. Bhattacharyya
143
60
0
05 Jun 2022
Self-Supervised Speech Representation Learning: A Review
Self-Supervised Speech Representation Learning: A ReviewIEEE Journal on Selected Topics in Signal Processing (IEEE JSTSP), 2022
Abdel-rahman Mohamed
Hung-yi Lee
Lasse Borgholt
Jakob Drachmann Havtorn
Joakim Edin
...
Shang-Wen Li
Karen Livescu
Lars Maaløe
Tara N. Sainath
Shinji Watanabe
SSLAI4TS
634
441
0
21 May 2022
SAMU-XLSR: Semantically-Aligned Multimodal Utterance-level Cross-Lingual
  Speech Representation
SAMU-XLSR: Semantically-Aligned Multimodal Utterance-level Cross-Lingual Speech RepresentationIEEE Journal on Selected Topics in Signal Processing (IEEE JSTSP), 2022
Sameer Khurana
Antoine Laurent
James R. Glass
157
43
0
17 May 2022
Sound Localization by Self-Supervised Time Delay Estimation
Sound Localization by Self-Supervised Time Delay EstimationEuropean Conference on Computer Vision (ECCV), 2022
Ziyang Chen
David Fouhey
Andrew Owens
SSL
239
23
0
26 Apr 2022
Cross-stitched Multi-modal Encoders
Cross-stitched Multi-modal Encoders
Karan Singla
Daniel Pressel
Ryan Price
Bhargav Srinivas Chinnari
Yeon-Jun Kim
S. Bangalore
147
0
0
20 Apr 2022
ContentVec: An Improved Self-Supervised Speech Representation by
  Disentangling Speakers
ContentVec: An Improved Self-Supervised Speech Representation by Disentangling SpeakersInternational Conference on Machine Learning (ICML), 2022
Kaizhi Qian
Yang Zhang
Heting Gao
Junrui Ni
Cheng-I Jeff Lai
David D. Cox
M. Hasegawa-Johnson
Shiyu Chang
DRL
181
141
0
20 Apr 2022
Self Supervised Adversarial Domain Adaptation for Cross-Corpus and
  Cross-Language Speech Emotion Recognition
Self Supervised Adversarial Domain Adaptation for Cross-Corpus and Cross-Language Speech Emotion RecognitionIEEE Transactions on Affective Computing (IEEE TAC), 2022
S. Latif
R. Rana
Sara Khalifa
Raja Jurdak
Björn Schuller
184
64
0
19 Apr 2022
On the pragmatism of using binary classifiers over data intensive neural
  network classifiers for detection of COVID-19 from voice
On the pragmatism of using binary classifiers over data intensive neural network classifiers for detection of COVID-19 from voice
Ankit Parag Shah
Hira Dhamyal
Yang Gao
Daniel Arancibia
Mario Arancibia
Bhiksha Raj
Rita Singh
156
6
0
11 Apr 2022
Federated Self-supervised Speech Representations: Are We There Yet?
Federated Self-supervised Speech Representations: Are We There Yet?Interspeech (Interspeech), 2022
Yan Gao
Javier Fernandez-Marques
Titouan Parcollet
Abhinav Mehrotra
Nicholas D. Lane
151
14
0
06 Apr 2022
Federated Self-Supervised Learning for Acoustic Event Classification
Federated Self-Supervised Learning for Acoustic Event ClassificationIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2022
Meng Feng
Chieh-Chi Kao
Qingming Tang
Ming Sun
Viktor Rozgic
Spyros Matsoukas
Chao Wang
158
14
0
22 Mar 2022
HEAR: Holistic Evaluation of Audio Representations
HEAR: Holistic Evaluation of Audio RepresentationsNeural Information Processing Systems (NeurIPS), 2022
Joseph P. Turian
Jordie Shier
H. Khan
Bhiksha Raj
Björn W. Schuller
...
P. Esling
Pranay Manocha
Shinji Watanabe
Zeyu Jin
Yonatan Bisk
375
133
0
06 Mar 2022
Audio Self-supervised Learning: A Survey
Audio Self-supervised Learning: A SurveyPatterns (Patterns), 2022
Shuo Liu
Adria Mallol-Ragolta
Emilia Parada-Cabeleiro
Kun Qian
Xingshuo Jing
Alexander Kathan
Bin Hu
Bjoern W. Schuller
SSL
234
128
0
02 Mar 2022
A Brief Overview of Unsupervised Neural Speech Representation Learning
A Brief Overview of Unsupervised Neural Speech Representation Learning
Lasse Borgholt
Jakob Drachmann Havtorn
Joakim Edin
Lars Maaløe
Christian Igel
BDLAI4TSSSL
227
13
0
01 Mar 2022
Visual Speech Recognition for Multiple Languages in the Wild
Visual Speech Recognition for Multiple Languages in the WildNature Machine Intelligence (Nat. Mach. Intell.), 2022
Pingchuan Ma
Stavros Petridis
Maja Pantic
VLM
389
191
0
26 Feb 2022
CALM: Contrastive Aligned Audio-Language Multirate and Multimodal
  Representations
CALM: Contrastive Aligned Audio-Language Multirate and Multimodal Representations
Vin Sachidananda
Shao-Yen Tseng
Erik Marchi
S. Kajarekar
P. Georgiou
134
9
0
08 Feb 2022
123
Next