v1v2v3 (latest)

A Fine-tuned Wav2vec 2.0/HuBERT Benchmark For Speech Emotion Recognition, Speaker Verification and Spoken Language Understanding

4 November 2021

Yingzhi Wang

Abdelmoumene Boumadane

A. Heba

ArXiv (abs)PDF HTML

Papers citing "A Fine-tuned Wav2vec 2.0/HuBERT Benchmark For Speech Emotion Recognition, Speaker Verification and Spoken Language Understanding"

50 / 83 papers shown

EM2LDL: A Multilingual Speech Corpus for Mixed Emotion Recognition through Label Distribution Learning

25 Nov 2025

Enabling Automatic Self-Talk Detection via Earables

10 Nov 2025

MT-HuBERT: Self-Supervised Mix-Training for Few-Shot Keyword Spotting in Mixed Speech

425

09 Nov 2025

Joint Learning using Mixture-of-Expert-Based Representation for Enhanced Speech Generation and Robust Emotion Recognition

153

10 Sep 2025

EDTalk++: Full Disentanglement for Controllable Talking Head Synthesis

Shuai Tan

Bin Ji

186

19 Aug 2025

EmoSLLM: Parameter-Efficient Adaptation of LLMs for Speech Emotion Recognition

Hugo Thimonier

Antony Perzo

Renaud Seguier

145

19 Aug 2025

Human Feedback Driven Dynamic Speech Emotion Recognition

Ilya Fedorov

Dmitry Korobchenko

18 Aug 2025

Deep Learning Approaches for Multimodal Intent Recognition: A Survey

...

191

24 Jul 2025

Segmentation-Variant Codebooks for Preservation of Paralinguistic and Prosodic Information

214

21 May 2025

Representation of perceived prosodic similarity of conversational feedback

Livia Qian

Carol Figueroa

Gabriel Skantze

120

19 May 2025

BERSting at the Screams: A Benchmark for Distanced, Emotional and Shouted Speech RecognitionComputer Speech and Language (CSL), 2025

332

30 Apr 2025

Can Diffusion Models Disentangle? A Theoretical Perspective

Liming Wang

Muhammad Jehanzeb Mirza

399

31 Mar 2025

Efficient Finetuning for Dimensional Speech Emotion Recognition in the Age of TransformersIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2025

Aneesha Sampath

James Tavernor

E. Provost

311

17 Feb 2025

Evaluating the Impact of Discriminative and Generative E2E Speech Enhancement Models on Syllable Stress Preservation

Rangavajjala Sankara Bharadwaj

Jhansi Mallela

Sai Harshitha Aluru

Chiranjeevi Yarra

191

11 Dec 2024

Exploring Prediction Targets in Masked Pre-Training for Speech Foundation Models

Li-Wei Chen

Takuya Higuchi

He Bai

Ahmed Hussen Abdelaziz

Alexander Rudnicky

Shinji Watanabe

Tatiana Likhomanenko

B. Theobald

Zakaria Aldeneh

311

16 Sep 2024

Continuous Learning of Transformer-based Audio Deepfake Detection

184

09 Sep 2024

NEST: Self-supervised Fast Conformer as All-purpose Seasoning to Speech Processing TasksIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2024

He Huang

Taejin Park

Kunal Dhawan

Jagadeesh Balam

Boris Ginsburg

SSL AI4TS

330

23 Aug 2024

VCEMO: Multi-Modal Emotion Recognition for Chinese Voiceprints

166

23 Aug 2024

SLIM: Style-Linguistics Mismatch Model for Generalized Audio Deepfake Detection

407

26 Jul 2024

Whisper-SV: Adapting Whisper for Low-data-resource Speaker Verification

Lei Xie

238

14 Jul 2024

MSP-Podcast SER Challenge 2024: Lántenne du Ventoux Multimodal Self-Supervised Learning for Speech Emotion Recognition

J. Duret

Mickael Rouvier

Yannick Esteve

118

08 Jul 2024

A Layer-Anchoring Strategy for Enhancing Cross-Lingual Speech Emotion Recognition

Shreya G. Upadhyay

John H. L. Hansen

Chi-Chun Lee

269

06 Jul 2024

Exploring Self-Supervised Multi-view Contrastive Learning for Speech Emotion Recognition with Limited Annotations

232

12 Jun 2024

ExHuBERT: Enhancing HuBERT Through Block Extension and Fine-Tuning on 37 Emotion Datasets

Shahin Amiriparian

Filip Packañ

Maurice Gerczuk

Björn W. Schuller

104

11 Jun 2024

SpeechVerse: A Large-scale Generalizable Audio Language Model

...

485

14 May 2024

A Large-Scale Evaluation of Speech Foundation Models

...

Shinji Watanabe

Hung-yi Lee

278

15 Apr 2024

EDTalk: Efficient Disentanglement for Emotional Talking Head SynthesisEuropean Conference on Computer Vision (ECCV), 2024

259

02 Apr 2024

Efficient Fine-tuning of Audio Spectrogram Transformers via Soft Mixture of Adapters

189

01 Feb 2024

Can you Remove the Downstream Model for Speaker Recognition with Self-Supervised Speech Features?

Stephen Shum

Ahmed Hussen Abdelaziz

Shinji Watanabe

B. Theobald

SSL

170

01 Feb 2024

A Multi-Task, Multi-Modal Approach for Predicting Categorical and Dimensional Emotions

Alex-Răzvan Ispas

Théo Deschamps-Berger

Laurence Devillers

147

31 Dec 2023

emotion2vec: Self-Supervised Pre-Training for Speech Emotion Representation

Xie Chen

302

243

23 Dec 2023

Speech and Text-Based Emotion Recognizer

Varun Sharma

10 Dec 2023

Generalized zero-shot audio-to-intent classificationAutomatic Speech Recognition & Understanding (ASRU), 2023

Veera Raghavendra Elluru

207

04 Nov 2023

Enhancing expressivity transfer in textless speech-to-speech translationAutomatic Speech Recognition & Understanding (ASRU), 2023

J. Duret

Benjamin O’Brien

Yannick Esteve

Titouan Parcollet

168

11 Oct 2023

Improving End-to-End Speech Processing by Efficient Text Data Utilization with Latent SynthesisConference on Empirical Methods in Natural Language Processing (EMNLP), 2023

Xingshan Zeng

256

09 Oct 2023

Multi-resolution HuBERT: Multi-resolution Speech Self-Supervised Learning with Masked Unit PredictionInternational Conference on Learning Representations (ICLR), 2023

Jiatong Shi

261

04 Oct 2023

Leveraging In-the-Wild Data for Effective Self-Supervised Pretraining in Speaker RecognitionIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023

Shuai Wang

Qi Liu

Haizhou Li

214

21 Sep 2023

Leveraging Speech PTM, Text LLM, and Emotional TTS for Speech Emotion RecognitionIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023

Ziyang Ma

Wen Wu

Zhisheng Zheng

Yiwei Guo

Qian Chen

Shiliang Zhang

Xie Chen

244

19 Sep 2023

Hierarchical Audio-Visual Information Fusion with Multi-label Joint Decoding for MER 2023ACM Multimedia (ACM MM), 2023

...

213

11 Sep 2023

Speech Emotion Recognition with Distilled Prosodic and Linguistic Affect RepresentationsIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023

Debaditya Shome

Ali Etemad

183

09 Sep 2023

Leveraging Label Information for Multimodal Emotion RecognitionInterspeech (Interspeech), 2023

239

05 Sep 2023

Speech Self-Supervised Representations Benchmarking: a Case for Larger Probing HeadsComputer Speech and Language (CSL), 2023

Mirco Ravanelli

234

28 Aug 2023

Decoding Emotions: A comprehensive Multilingual Study of Speech Models for Speech Emotion Recognition

Anant Singh

Akshat Gupta

188

17 Aug 2023

AKVSR: Audio Knowledge Empowered Visual Speech Recognition by Compressing Audio Knowledge of a Pretrained ModelIEEE transactions on multimedia (IEEE TMM), 2023

Jeong Hun Yeo

187

15 Aug 2023

Leveraging Pretrained ASR Encoders for Effective and Efficient End-to-End Speech Intent Classification and Slot FillingInterspeech (Interspeech), 2023

Hengguan Huang

Jagadeesh Balam

Boris Ginsburg

181

13 Jul 2023

Knowledge-Aware Audio-Grounded Generative Slot Filling for Limited Annotated DataComputer Speech and Language (CSL), 2023

Guangzhi Sun

189

04 Jul 2023

Learning Multilingual Expressive Speech Representation for Prosody Prediction without Parallel DataSpeech Synthesis Workshop (SSW), 2023

J. Duret

Titouan Parcollet

Yannick Esteve

133

29 Jun 2023

Speech Emotion Diarization: Which Emotion Appears When?Automatic Speech Recognition & Understanding (ASRU), 2023

Yingzhi Wang

Mirco Ravanelli

Alya Yacoubi

149

22 Jun 2023

Toward Leveraging Pre-Trained Self-Supervised Frontends for Automatic Singing Voice Understanding Tasks: Three Case StudiesAsia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), 2023

Yuya Yamamoto

168

22 Jun 2023

Unsupervised speech intelligibility assessment with utterance level alignment distance between teacher and learner Wav2Vec-2.0 representations

Nayan Anand

Meenakshi Sirigiraju

Chiranjeevi Yarra

123

15 Jun 2023