v1v2v3 (latest)

A Fine-tuned Wav2vec 2.0/HuBERT Benchmark For Speech Emotion Recognition, Speaker Verification and Spoken Language Understanding

4 November 2021

Yingzhi Wang

Abdelmoumene Boumadane

A. Heba

ArXiv (abs)PDF HTML

Papers citing "A Fine-tuned Wav2vec 2.0/HuBERT Benchmark For Speech Emotion Recognition, Speaker Verification and Spoken Language Understanding"

33 / 83 papers shown

MFSN: Multi-perspective Fusion Search Network For Pre-training Knowledge in Speech Emotion RecognitionInterspeech (Interspeech), 2023

154

12 Jun 2023

Speech Self-Supervised Representation Benchmarking: Are We Doing it Right?Interspeech (Interspeech), 2023

Mirco Ravanelli

212

01 Jun 2023

CIF-PT: Bridging Speech and Text Representations for Spoken Language Understanding via Continuous Integrate-and-Fire Pre-TrainingAnnual Meeting of the Association for Computational Linguistics (ACL), 2023

Linhao Dong

108

27 May 2023

Can ChatGPT Detect Intent? Evaluating Large Language Models for Spoken Language UnderstandingInterspeech (Interspeech), 2023

Mutian He

Philip N. Garner

ELM AI4MH LRM

253

22 May 2023

Recycle-and-Distill: Universal Compression Strategy for Transformer-based Speech SSL Models with Attention Map Reusing and Masking DistillationInterspeech (Interspeech), 2023

307

19 May 2023

The Interpreter Understands Your Meaning: End-to-end Spoken Language Understanding Aided by Speech TranslationConference on Empirical Methods in Natural Language Processing (EMNLP), 2023

Mutian He

Philip N. Garner

326

16 May 2023

Self-supervised Neural Factor Analysis for Disentangling Utterance-level Speech RepresentationsInternational Conference on Machine Learning (ICML), 2023

171

14 May 2023

Fast Conformer with Linearly Scalable Attention for Efficient Speech RecognitionAutomatic Speech Recognition & Understanding (ASRU), 2023

...

Krishna Puvvada

Jagadeesh Balam

Boris Ginsburg

330

144

08 May 2023

A vector quantized masked autoencoder for audiovisual speech emotion recognitionComputer Vision and Image Understanding (CVIU), 2023

472

05 May 2023

A vector quantized masked autoencoder for speech emotion recognition

Samir Sadok

Simon Leglaive

Renaud Séguier

245

21 Apr 2023

Efficient Sequence Transduction by Jointly Predicting Tokens and DurationsInternational Conference on Machine Learning (ICML), 2023

Hainan Xu

Boris Ginsburg

187

13 Apr 2023

Designing and Evaluating Speech Emotion Recognition Systems: A reality check case study with IEMOCAPIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023

Nikolaos Antoniou

Athanasios Katsamanis

Theodoros Giannakopoulos

Shrikanth Narayanan

180

03 Apr 2023

A Hierarchical Regression Chain Framework for Affective Vocal Burst RecognitionIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023

Dongsheng Li

144

14 Mar 2023

Skit-S2I: An Indian Accented Speech to Intent dataset

Shangeth Rajaa

Swaraj Dalmia

Kumarmanas Nethil

183

26 Dec 2022

Disentangling Prosody Representations with Unsupervised Speech ReconstructionIEEE/ACM Transactions on Audio Speech and Language Processing (TASLP), 2022

Leyuan Qu

Taiha Li

C. Weber

Theresa Pekarek-Rosin

F. Ren

S. Wermter

242

14 Dec 2022

Parameter Efficient Transfer Learning for Various Speech Processing TasksIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2022

Shinta Otake

Rei Kawakami

Nakamasa Inoue

176

06 Dec 2022

Bidirectional Representations for Low Resource Spoken Language UnderstandingApplied Sciences (Appl. Sci.), 2022

Quentin Meeus

Marie-Francine Moens

Hugo Van hamme

188

24 Nov 2022

Multi-Label Training for Text-Independent Speaker Identification

Yuqi Xue

153

14 Nov 2022

Speech-based emotion recognition with self-supervised models using attentive channel-wise correlations and label smoothingIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2022

141

03 Nov 2022

Phoneme Segmentation Using Self-Supervised Speech ModelsSpoken Language Technology Workshop (SLT), 2022

Luke Strgar

David Harwath

SSL

171

02 Nov 2022

Predicting Multi-Codebook Vector Quantization Indexes for Knowledge DistillationIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2022

Liyong Guo

Xiaoyu Yang

Quandong Wang

Yuxiang Kong

Zengwei Yao

...

Wei Kang

Long Lin

188

31 Oct 2022

Application of Knowledge Distillation to Multi-task Speech Representation LearningInterspeech (Interspeech), 2022

185

29 Oct 2022

SUPERB @ SLT 2022: Challenge on Generalization and Efficiency of Self-Supervised Speech Representation LearningSpoken Language Technology Workshop (SLT), 2022

Tzu-Quan Lin

...

243

16 Oct 2022

Training speech emotion classifier without categorical annotations

Meysam Shamsi

Marie Tahon

209

14 Oct 2022

An Efficient Multitask Learning Architecture for Affective Vocal Burst Analysis

126

28 Sep 2022

Exploring the Effectiveness of Self-supervised Learning and Classifier Chains in Emotion Recognition of Nonverbal Vocalizations

Detai Xin

Shinnosuke Takamichi

Hiroshi Saruwatari

21 Jun 2022

Self-Supervised Speech Representation Learning: A ReviewIEEE Journal on Selected Topics in Signal Processing (IEEE JSTSP), 2022

Abdel-rahman Mohamed

Hung-yi Lee

Lasse Borgholt

Jakob Drachmann Havtorn

...

668

444

21 May 2022

Hierarchical Softmax for End-to-End Low-resource Multilingual Speech RecognitionIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2022

Qianying Liu

Zhuo Gong

Zhengdong Yang

Yuhang Yang

Sheng Li

...

Sadao Kurohashi

177

08 Apr 2022

MTI-Net: A Multi-Target Speech Intelligibility Prediction ModelInterspeech (Interspeech), 2022

Ryandhimas E. Zezario

277

07 Apr 2022

Probing Speech Emotion Recognition Transformers for Linguistic KnowledgeInterspeech (Interspeech), 2022

Andreas Triantafyllopoulos

Björn W. Schuller

326

01 Apr 2022

Visualizations of Complex Sequences of Family-Infant Vocalizations Using Bag-of-Audio-Words Approach Based on Wav2vec 2.0 Features

Jialu Li

M. Hasegawa-Johnson

Nancy L. McElwain

122

29 Mar 2022

Dawn of the transformer era in speech emotion recognition: closing the valence gapIEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2022

Johannes Wagner

Andreas Triantafyllopoulos

Björn W. Schuller

389

409

14 Mar 2022

Mockingjay: Unsupervised Speech Representation Learning with Deep Bidirectional Transformer EncodersIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2019

482

393

25 Oct 2019