Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
All Papers
0 / 0 papers shown
Title
Home
Papers
2111.02735
Cited By
v1
v2
v3 (latest)
A Fine-tuned Wav2vec 2.0/HuBERT Benchmark For Speech Emotion Recognition, Speaker Verification and Spoken Language Understanding
4 November 2021
Yingzhi Wang
Abdelmoumene Boumadane
A. Heba
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"A Fine-tuned Wav2vec 2.0/HuBERT Benchmark For Speech Emotion Recognition, Speaker Verification and Spoken Language Understanding"
50 / 82 papers shown
Title
Enabling Automatic Self-Talk Detection via Earables
Euihyeok Lee
Seonghyeon Kim
Sanghun Im
Heung-Seon Oh
Seungwoo Kang
68
0
0
10 Nov 2025
MT-HuBERT: Self-Supervised Mix-Training for Few-Shot Keyword Spotting in Mixed Speech
Junming Yuan
Ying Shi
D. Wang
Lantian Li
A. Hamdulla
SSL
344
0
0
09 Nov 2025
Joint Learning using Mixture-of-Expert-Based Representation for Enhanced Speech Generation and Robust Emotion Recognition
Jing-Tong Tzeng
John H. L. Hansen
Chi-Chun Lee
MoE
116
1
0
10 Sep 2025
EmoSLLM: Parameter-Efficient Adaptation of LLMs for Speech Emotion Recognition
Hugo Thimonier
Antony Perzo
Renaud Seguier
104
1
0
19 Aug 2025
EDTalk++: Full Disentanglement for Controllable Talking Head Synthesis
Shuai Tan
Bin Ji
154
0
0
19 Aug 2025
Human Feedback Driven Dynamic Speech Emotion Recognition
Ilya Fedorov
Dmitry Korobchenko
40
0
0
18 Aug 2025
Deep Learning Approaches for Multimodal Intent Recognition: A Survey
Jingwei Zhao
Yuhua Wen
Qifei Li
Minchi Hu
Yingying Zhou
...
Junyang Wu
Yingming Gao
Zhengqi Wen
Jianhua Tao
Ya Li
ViT
136
1
0
24 Jul 2025
Segmentation-Variant Codebooks for Preservation of Paralinguistic and Prosodic Information
Nicholas Sanders
Yuanchao Li
Korin Richmond
Simon King
172
0
0
21 May 2025
Representation of perceived prosodic similarity of conversational feedback
Livia Qian
Carol Figueroa
Gabriel Skantze
105
0
0
19 May 2025
BERSting at the Screams: A Benchmark for Distanced, Emotional and Shouted Speech Recognition
Computer Speech and Language (CSL), 2025
Paige Tuttosi
Mantaj Dhillon
Luna Sang
Shane Eastwood
Poorvi Bhatia
Quang Minh Dinh
Avni Kapoor
Yewon Jin
Angelica Lim
299
2
0
30 Apr 2025
Can Diffusion Models Disentangle? A Theoretical Perspective
Liming Wang
Muhammad Jehanzeb Mirza
Yishu Gong
Yuan Gong
Jiaqi Zhang
Brian Tracey
Katerina Placek
Marco Vilela
James Glass
DiffM
CoGe
370
0
0
31 Mar 2025
Efficient Finetuning for Dimensional Speech Emotion Recognition in the Age of Transformers
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2025
Aneesha Sampath
James Tavernor
E. Provost
287
3
0
17 Feb 2025
Evaluating the Impact of Discriminative and Generative E2E Speech Enhancement Models on Syllable Stress Preservation
Rangavajjala Sankara Bharadwaj
Jhansi Mallela
Sai Harshitha Aluru
Chiranjeevi Yarra
158
1
0
11 Dec 2024
Exploring Prediction Targets in Masked Pre-Training for Speech Foundation Models
Li-Wei Chen
Takuya Higuchi
He Bai
Ahmed Hussen Abdelaziz
Alexander Rudnicky
Shinji Watanabe
Tatiana Likhomanenko
B. Theobald
Zakaria Aldeneh
265
1
0
16 Sep 2024
Continuous Learning of Transformer-based Audio Deepfake Detection
Tuan Duy Nguyen Le
Kah Kuan Teh
Huy Dat Tran
ViT
158
6
0
09 Sep 2024
NEST: Self-supervised Fast Conformer as All-purpose Seasoning to Speech Processing Tasks
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2024
He Huang
Taejin Park
Kunal Dhawan
Ivan Medennikov
Krishna Puvvada
Nithin Rao Koluguri
Weiqing Wang
Jagadeesh Balam
Boris Ginsburg
SSL
AI4TS
265
4
0
23 Aug 2024
VCEMO: Multi-Modal Emotion Recognition for Chinese Voiceprints
Jinghua Tang
Liyun Zhang
Liyun Zhang
Yu Lu
Lanqing Yang
YiChao Chen
Minjie Bian
Xiaoshan Li
Guangtao Xue
114
2
0
23 Aug 2024
SLIM: Style-Linguistics Mismatch Model for Generalized Audio Deepfake Detection
Yi Zhu
Surya Koppisetti
Trang Tran
Gaurav Bharaj
353
22
0
26 Jul 2024
Whisper-SV: Adapting Whisper for Low-data-resource Speaker Verification
Li Zhang
Ning Jiang
Qing Wang
Yuehong Li
Quan Lu
Lei Xie
201
14
0
14 Jul 2024
MSP-Podcast SER Challenge 2024: Lántenne du Ventoux Multimodal Self-Supervised Learning for Speech Emotion Recognition
J. Duret
Mickael Rouvier
Yannick Esteve
114
4
0
08 Jul 2024
A Layer-Anchoring Strategy for Enhancing Cross-Lingual Speech Emotion Recognition
Shreya G. Upadhyay
John H. L. Hansen
Chi-Chun Lee
218
7
0
06 Jul 2024
Exploring Self-Supervised Multi-view Contrastive Learning for Speech Emotion Recognition with Limited Annotations
Bulat Khaertdinov
Pedro Jeuris
Annanda Sousa
Enrique Hortal
180
2
0
12 Jun 2024
ExHuBERT: Enhancing HuBERT Through Block Extension and Fine-Tuning on 37 Emotion Datasets
Shahin Amiriparian
Filip Packañ
Maurice Gerczuk
Björn W. Schuller
87
17
0
11 Jun 2024
SpeechVerse: A Large-scale Generalizable Audio Language Model
Nilaksh Das
Saket Dingliwal
S. Ronanki
Rohit Paturi
David Huang
...
Monica Sunkara
S. Srinivasan
Kyu J. Han
Katrin Kirchhoff
Katrin Kirchhoff
372
67
0
14 May 2024
A Large-Scale Evaluation of Speech Foundation Models
Shu-Wen Yang
Heng-Jui Chang
Zili Huang
Andy T. Liu
Cheng-I Jeff Lai
...
Kushal Lakhotia
Shang-Wen Li
Abdelrahman Mohamed
Shinji Watanabe
Hung-yi Lee
224
55
0
15 Apr 2024
EDTalk: Efficient Disentanglement for Emotional Talking Head Synthesis
European Conference on Computer Vision (ECCV), 2024
Shuai Tan
Bin Ji
Mengxiao Bi
Ye Pan
218
63
0
02 Apr 2024
Efficient Fine-tuning of Audio Spectrogram Transformers via Soft Mixture of Adapters
Umberto Cappellazzo
Daniele Falavigna
Alessio Brutti
MoE
152
6
0
01 Feb 2024
Can you Remove the Downstream Model for Speaker Recognition with Self-Supervised Speech Features?
Zakaria Aldeneh
Takuya Higuchi
Jee-weon Jung
Skyler Seto
Tatiana Likhomanenko
Stephen Shum
Ahmed Hussen Abdelaziz
Shinji Watanabe
B. Theobald
SSL
144
4
0
01 Feb 2024
A Multi-Task, Multi-Modal Approach for Predicting Categorical and Dimensional Emotions
Alex-Răzvan Ispas
Théo Deschamps-Berger
Laurence Devillers
145
4
0
31 Dec 2023
emotion2vec: Self-Supervised Pre-Training for Speech Emotion Representation
Ziyang Ma
Zhisheng Zheng
Jiaxin Ye
Jinchao Li
Zhifu Gao
Shiliang Zhang
Xie Chen
MDE
SLR
SSL
252
223
0
23 Dec 2023
Speech and Text-Based Emotion Recognizer
Varun Sharma
62
0
0
10 Dec 2023
Generalized zero-shot audio-to-intent classification
Automatic Speech Recognition & Understanding (ASRU), 2023
Veera Raghavendra Elluru
Devang Kulshreshtha
Rohit Paturi
S. Bodapati
S. Ronanki
170
4
0
04 Nov 2023
Enhancing expressivity transfer in textless speech-to-speech translation
Automatic Speech Recognition & Understanding (ASRU), 2023
J. Duret
Benjamin O’Brien
Yannick Esteve
Titouan Parcollet
135
3
0
11 Oct 2023
Improving End-to-End Speech Processing by Efficient Text Data Utilization with Latent Synthesis
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Jianqiao Lu
Wenyong Huang
Nianzu Zheng
Xingshan Zeng
Y. Yeung
Xiao Chen
SyDa
217
1
0
09 Oct 2023
Multi-resolution HuBERT: Multi-resolution Speech Self-Supervised Learning with Masked Unit Prediction
International Conference on Learning Representations (ICLR), 2023
Jiatong Shi
Hirofumi Inaguma
Xutai Ma
Ilia Kulikov
Anna Y. Sun
223
35
0
04 Oct 2023
Leveraging In-the-Wild Data for Effective Self-Supervised Pretraining in Speaker Recognition
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023
Shuai Wang
Qibing Bai
Qi Liu
Jianwei Yu
Zhengyang Chen
Bing Han
Yan-min Qian
Haizhou Li
187
2
0
21 Sep 2023
Leveraging Speech PTM, Text LLM, and Emotional TTS for Speech Emotion Recognition
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023
Ziyang Ma
Wen Wu
Zhisheng Zheng
Yiwei Guo
Qian Chen
Shiliang Zhang
Xie Chen
204
27
0
19 Sep 2023
Hierarchical Audio-Visual Information Fusion with Multi-label Joint Decoding for MER 2023
ACM Multimedia (ACM MM), 2023
Haotian Wang
Yuxuan Xi
Hang Chen
Jun Du
Yan Song
...
Pengfei Hu
Ya Jiang
Shi Cheng
Jie Zhang
Yuzhe Weng
176
4
0
11 Sep 2023
Speech Emotion Recognition with Distilled Prosodic and Linguistic Affect Representations
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023
Debaditya Shome
Ali Etemad
145
9
0
09 Sep 2023
Leveraging Label Information for Multimodal Emotion Recognition
Interspeech (Interspeech), 2023
Pei-Hsin Wang
Sunlu Zeng
Junqing Chen
Lu Fan
Meng Chen
Youzheng Wu
Xiaodong He
197
6
0
05 Sep 2023
Speech Self-Supervised Representations Benchmarking: a Case for Larger Probing Heads
Computer Speech and Language (CSL), 2023
Salah Zaiem
Youcef Kemiche
Titouan Parcollet
S. Essid
Mirco Ravanelli
SSL
190
18
0
28 Aug 2023
Decoding Emotions: A comprehensive Multilingual Study of Speech Models for Speech Emotion Recognition
Anant Singh
Akshat Gupta
141
5
0
17 Aug 2023
AKVSR: Audio Knowledge Empowered Visual Speech Recognition by Compressing Audio Knowledge of a Pretrained Model
IEEE transactions on multimedia (IEEE TMM), 2023
Jeong Hun Yeo
Minsu Kim
J. Choi
Dae Hoe Kim
Y. Ro
134
25
0
15 Aug 2023
Leveraging Pretrained ASR Encoders for Effective and Efficient End-to-End Speech Intent Classification and Slot Filling
Interspeech (Interspeech), 2023
Hengguan Huang
Jagadeesh Balam
Boris Ginsburg
159
6
0
13 Jul 2023
Knowledge-Aware Audio-Grounded Generative Slot Filling for Limited Annotated Data
Computer Speech and Language (CSL), 2023
Guangzhi Sun
Chuxu Zhang
Ivan Vulić
Paweł Budzianowski
P. Woodland
172
6
0
04 Jul 2023
Learning Multilingual Expressive Speech Representation for Prosody Prediction without Parallel Data
Speech Synthesis Workshop (SSW), 2023
J. Duret
Titouan Parcollet
Yannick Esteve
129
4
0
29 Jun 2023
Speech Emotion Diarization: Which Emotion Appears When?
Automatic Speech Recognition & Understanding (ASRU), 2023
Yingzhi Wang
Mirco Ravanelli
Alya Yacoubi
123
20
0
22 Jun 2023
Toward Leveraging Pre-Trained Self-Supervised Frontends for Automatic Singing Voice Understanding Tasks: Three Case Studies
Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), 2023
Yuya Yamamoto
133
3
0
22 Jun 2023
Unsupervised speech intelligibility assessment with utterance level alignment distance between teacher and learner Wav2Vec-2.0 representations
Nayan Anand
Meenakshi Sirigiraju
Chiranjeevi Yarra
93
1
0
15 Jun 2023
MFSN: Multi-perspective Fusion Search Network For Pre-training Knowledge in Speech Emotion Recognition
Interspeech (Interspeech), 2023
Haiyang Sun
Fulin Zhang
Yingying Gao
Zheng Lian
Shilei Zhang
Junlan Feng
144
7
0
12 Jun 2023
1
2
Next