ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2111.02735
  4. Cited By
A Fine-tuned Wav2vec 2.0/HuBERT Benchmark For Speech Emotion
  Recognition, Speaker Verification and Spoken Language Understanding
v1v2v3 (latest)

A Fine-tuned Wav2vec 2.0/HuBERT Benchmark For Speech Emotion Recognition, Speaker Verification and Spoken Language Understanding

4 November 2021
Yingzhi Wang
Abdelmoumene Boumadane
A. Heba
ArXiv (abs)PDFHTML

Papers citing "A Fine-tuned Wav2vec 2.0/HuBERT Benchmark For Speech Emotion Recognition, Speaker Verification and Spoken Language Understanding"

33 / 83 papers shown
MFSN: Multi-perspective Fusion Search Network For Pre-training Knowledge
  in Speech Emotion Recognition
MFSN: Multi-perspective Fusion Search Network For Pre-training Knowledge in Speech Emotion RecognitionInterspeech (Interspeech), 2023
Haiyang Sun
Fulin Zhang
Yingying Gao
Zheng Lian
Shilei Zhang
Junlan Feng
154
7
0
12 Jun 2023
Speech Self-Supervised Representation Benchmarking: Are We Doing it
  Right?
Speech Self-Supervised Representation Benchmarking: Are We Doing it Right?Interspeech (Interspeech), 2023
Salah Zaiem
Youcef Kemiche
Titouan Parcollet
S. Essid
Mirco Ravanelli
SSL
212
33
0
01 Jun 2023
CIF-PT: Bridging Speech and Text Representations for Spoken Language
  Understanding via Continuous Integrate-and-Fire Pre-Training
CIF-PT: Bridging Speech and Text Representations for Spoken Language Understanding via Continuous Integrate-and-Fire Pre-TrainingAnnual Meeting of the Association for Computational Linguistics (ACL), 2023
Linhao Dong
Zhecheng An
Peihao Wu
Jun Zhang
Lu Lu
Zejun Ma
108
6
0
27 May 2023
Can ChatGPT Detect Intent? Evaluating Large Language Models for Spoken
  Language Understanding
Can ChatGPT Detect Intent? Evaluating Large Language Models for Spoken Language UnderstandingInterspeech (Interspeech), 2023
Mutian He
Philip N. Garner
ELMAI4MHLRM
253
35
0
22 May 2023
Recycle-and-Distill: Universal Compression Strategy for
  Transformer-based Speech SSL Models with Attention Map Reusing and Masking
  Distillation
Recycle-and-Distill: Universal Compression Strategy for Transformer-based Speech SSL Models with Attention Map Reusing and Masking DistillationInterspeech (Interspeech), 2023
Kangwook Jang
Sungnyun Kim
Se-Young Yun
Hoi-Rim Kim
307
7
0
19 May 2023
The Interpreter Understands Your Meaning: End-to-end Spoken Language
  Understanding Aided by Speech Translation
The Interpreter Understands Your Meaning: End-to-end Spoken Language Understanding Aided by Speech TranslationConference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Mutian He
Philip N. Garner
326
5
0
16 May 2023
Self-supervised Neural Factor Analysis for Disentangling Utterance-level
  Speech Representations
Self-supervised Neural Factor Analysis for Disentangling Utterance-level Speech RepresentationsInternational Conference on Machine Learning (ICML), 2023
Wei-wei Lin
Chenhang He
Man-Wai Mak
Youzhi Tu
171
6
0
14 May 2023
Fast Conformer with Linearly Scalable Attention for Efficient Speech
  Recognition
Fast Conformer with Linearly Scalable Attention for Efficient Speech RecognitionAutomatic Speech Recognition & Understanding (ASRU), 2023
Dima Rekesh
Nithin Rao Koluguri
Samuel Kriman
Somshubra Majumdar
Vahid Noroozi
...
Oleksii Hrinchuk
Krishna Puvvada
Ankur Kumar
Jagadeesh Balam
Boris Ginsburg
330
144
0
08 May 2023
A vector quantized masked autoencoder for audiovisual speech emotion recognition
A vector quantized masked autoencoder for audiovisual speech emotion recognitionComputer Vision and Image Understanding (CVIU), 2023
Samir Sadok
Simon Leglaive
Renaud Séguier
SSL
472
11
0
05 May 2023
A vector quantized masked autoencoder for speech emotion recognition
A vector quantized masked autoencoder for speech emotion recognition
Samir Sadok
Simon Leglaive
Renaud Séguier
245
26
0
21 Apr 2023
Efficient Sequence Transduction by Jointly Predicting Tokens and
  Durations
Efficient Sequence Transduction by Jointly Predicting Tokens and DurationsInternational Conference on Machine Learning (ICML), 2023
Hainan Xu
Fei Jia
Somshubra Majumdar
Hengguan Huang
Shinji Watanabe
Boris Ginsburg
187
44
0
13 Apr 2023
Designing and Evaluating Speech Emotion Recognition Systems: A reality
  check case study with IEMOCAP
Designing and Evaluating Speech Emotion Recognition Systems: A reality check case study with IEMOCAPIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023
Nikolaos Antoniou
Athanasios Katsamanis
Theodoros Giannakopoulos
Shrikanth Narayanan
180
24
0
03 Apr 2023
A Hierarchical Regression Chain Framework for Affective Vocal Burst
  Recognition
A Hierarchical Regression Chain Framework for Affective Vocal Burst RecognitionIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023
Jinchao Li
Xixin Wu
Kaitao Song
Dongsheng Li
Xunying Liu
Helen M. Meng
144
2
0
14 Mar 2023
Skit-S2I: An Indian Accented Speech to Intent dataset
Skit-S2I: An Indian Accented Speech to Intent dataset
Shangeth Rajaa
Swaraj Dalmia
Kumarmanas Nethil
183
6
0
26 Dec 2022
Disentangling Prosody Representations with Unsupervised Speech
  Reconstruction
Disentangling Prosody Representations with Unsupervised Speech ReconstructionIEEE/ACM Transactions on Audio Speech and Language Processing (TASLP), 2022
Leyuan Qu
Taiha Li
C. Weber
Theresa Pekarek-Rosin
F. Ren
S. Wermter
242
16
0
14 Dec 2022
Parameter Efficient Transfer Learning for Various Speech Processing
  Tasks
Parameter Efficient Transfer Learning for Various Speech Processing TasksIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2022
Shinta Otake
Rei Kawakami
Nakamasa Inoue
176
21
0
06 Dec 2022
Bidirectional Representations for Low Resource Spoken Language
  Understanding
Bidirectional Representations for Low Resource Spoken Language UnderstandingApplied Sciences (Appl. Sci.), 2022
Quentin Meeus
Marie-Francine Moens
Hugo Van hamme
188
2
0
24 Nov 2022
Multi-Label Training for Text-Independent Speaker Identification
Multi-Label Training for Text-Independent Speaker Identification
Yuqi Xue
153
0
0
14 Nov 2022
Speech-based emotion recognition with self-supervised models using
  attentive channel-wise correlations and label smoothing
Speech-based emotion recognition with self-supervised models using attentive channel-wise correlations and label smoothingIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2022
Sofoklis Kakouros
Themos Stafylakis
Ladislav Mošner
L. Burget
141
19
0
03 Nov 2022
Phoneme Segmentation Using Self-Supervised Speech Models
Phoneme Segmentation Using Self-Supervised Speech ModelsSpoken Language Technology Workshop (SLT), 2022
Luke Strgar
David Harwath
SSL
171
13
0
02 Nov 2022
Predicting Multi-Codebook Vector Quantization Indexes for Knowledge
  Distillation
Predicting Multi-Codebook Vector Quantization Indexes for Knowledge DistillationIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2022
Liyong Guo
Xiaoyu Yang
Quandong Wang
Yuxiang Kong
Zengwei Yao
...
Wei Kang
Long Lin
Mingshuang Luo
Piotr Żelasko
Daniel Povey
VLM
188
10
0
31 Oct 2022
Application of Knowledge Distillation to Multi-task Speech
  Representation Learning
Application of Knowledge Distillation to Multi-task Speech Representation LearningInterspeech (Interspeech), 2022
Mine Kerpicci
V. Nguyen
Shuhua Zhang
Erik M. Visser
185
0
0
29 Oct 2022
SUPERB @ SLT 2022: Challenge on Generalization and Efficiency of
  Self-Supervised Speech Representation Learning
SUPERB @ SLT 2022: Challenge on Generalization and Efficiency of Self-Supervised Speech Representation LearningSpoken Language Technology Workshop (SLT), 2022
Tzu-hsun Feng
Annie Dong
Ching-Feng Yeh
Shu-Wen Yang
Tzu-Quan Lin
...
Xuankai Chang
Shinji Watanabe
Abdel-rahman Mohamed
Shang-Wen Li
Hung-yi Lee
ELMSSL
243
38
0
16 Oct 2022
Training speech emotion classifier without categorical annotations
Training speech emotion classifier without categorical annotations
Meysam Shamsi
Marie Tahon
209
2
0
14 Oct 2022
An Efficient Multitask Learning Architecture for Affective Vocal Burst
  Analysis
An Efficient Multitask Learning Architecture for Affective Vocal Burst Analysis
Tobias Hallmen
Silvan Mertes
Dominik Schiller
Elisabeth André
126
5
0
28 Sep 2022
Exploring the Effectiveness of Self-supervised Learning and Classifier
  Chains in Emotion Recognition of Nonverbal Vocalizations
Exploring the Effectiveness of Self-supervised Learning and Classifier Chains in Emotion Recognition of Nonverbal Vocalizations
Detai Xin
Shinnosuke Takamichi
Hiroshi Saruwatari
97
15
0
21 Jun 2022
Self-Supervised Speech Representation Learning: A Review
Self-Supervised Speech Representation Learning: A ReviewIEEE Journal on Selected Topics in Signal Processing (IEEE JSTSP), 2022
Abdel-rahman Mohamed
Hung-yi Lee
Lasse Borgholt
Jakob Drachmann Havtorn
Joakim Edin
...
Shang-Wen Li
Karen Livescu
Lars Maaløe
Tara N. Sainath
Shinji Watanabe
SSLAI4TS
668
444
0
21 May 2022
Hierarchical Softmax for End-to-End Low-resource Multilingual Speech
  Recognition
Hierarchical Softmax for End-to-End Low-resource Multilingual Speech RecognitionIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2022
Qianying Liu
Zhuo Gong
Zhengdong Yang
Yuhang Yang
Sheng Li
...
Nobuaki Minematsu
Hao-Ming Huang
Fei Cheng
Chenhui Chu
Sadao Kurohashi
177
10
0
08 Apr 2022
MTI-Net: A Multi-Target Speech Intelligibility Prediction Model
MTI-Net: A Multi-Target Speech Intelligibility Prediction ModelInterspeech (Interspeech), 2022
Ryandhimas E. Zezario
Szu-Wei Fu
Fei Chen
C. Fuh
Hsin-Min Wang
Yu Tsao
277
17
0
07 Apr 2022
Probing Speech Emotion Recognition Transformers for Linguistic Knowledge
Probing Speech Emotion Recognition Transformers for Linguistic KnowledgeInterspeech (Interspeech), 2022
Andreas Triantafyllopoulos
Johannes Wagner
H. Wierstorf
Maximilian Schmitt
U. Reichel
F. Eyben
Felix Burkhardt
Björn W. Schuller
326
33
0
01 Apr 2022
Visualizations of Complex Sequences of Family-Infant Vocalizations Using
  Bag-of-Audio-Words Approach Based on Wav2vec 2.0 Features
Visualizations of Complex Sequences of Family-Infant Vocalizations Using Bag-of-Audio-Words Approach Based on Wav2vec 2.0 Features
Jialu Li
M. Hasegawa-Johnson
Nancy L. McElwain
122
1
0
29 Mar 2022
Dawn of the transformer era in speech emotion recognition: closing the
  valence gap
Dawn of the transformer era in speech emotion recognition: closing the valence gapIEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2022
Johannes Wagner
Andreas Triantafyllopoulos
H. Wierstorf
Maximilian Schmitt
Felix Burkhardt
F. Eyben
Björn W. Schuller
389
409
0
14 Mar 2022
Mockingjay: Unsupervised Speech Representation Learning with Deep
  Bidirectional Transformer Encoders
Mockingjay: Unsupervised Speech Representation Learning with Deep Bidirectional Transformer EncodersIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2019
Andy T. Liu
Shu-Wen Yang
Po-Han Chi
Po-Chun Hsu
Hung-yi Lee
SSL
482
393
0
25 Oct 2019
Previous
12