v1v2 (latest)

Fine-tuning wav2vec2 for speaker recognition

30 September 2021

Nik Vaessen

David A. van Leeuwen

ArXiv (abs)PDF HTML Github (145★)

Papers citing "Fine-tuning wav2vec2 for speaker recognition"

50 / 51 papers shown

Dialect Identification Using Resource-Efficient Fine-Tuning ApproachesAsia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA), 2025

Zirui Lin

Haris Gulzar

Monnika Roslianna Busto

Akiko Masaki

Takeharu Eda

K. Nakadai

30 Nov 2025

XLSR-Kanformer: A KAN-Intergrated model for Synthetic Speech DetectionAdvanced Video and Signal Based Surveillance (AVSS), 2025

Phuong Tuan Dat

Tran Huy Dat

102

08 Oct 2025

Pushing the Performance of Synthetic Speech Detection with Kolmogorov-Arnold Networks and Self-Supervised Learning Models

Tuan Dat Phuong

Long-Vu Hoang

Huy-Dat Tran

138

17 Jun 2025

Speaker Fuzzy Fingerprints: Benchmarking Text-Based Identification in Multiparty DialoguesIEEE International Conference on Fuzzy Systems (FUZZ-IEEE), 2025

Rui Ribeiro

Luísa Coheur

Joao Paulo Carvalho

266

21 Apr 2025

Efficient Finetuning for Dimensional Speech Emotion Recognition in the Age of TransformersIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2025

Aneesha Sampath

James Tavernor

E. Provost

307

17 Feb 2025

Memory-Efficient Training for Deep Speaker Embedding Learning in Speaker Verification

Bei Liu

Yanmin Qian

443

02 Dec 2024

Deep Insights into Cognitive Decline: A Survey of Leveraging Non-Intrusive Modalities with Deep Learning TechniquesApplied Soft Computing (Appl. Soft Comput.), 2024

David Ortiz-Perez

Manuel Benavent-Lledo

José García Rodríguez

David Tomás

M. Flores Vizcaya-Moreno

231

24 Oct 2024

Layer-aware TDNN: Speaker Recognition Using Multi-Layer Features from Pre-Trained Models

Jin Sob Kim

Hyun Joon Park

Wooseok Shin

Juan Yun

Sung Won Han

SLR

454

12 Sep 2024

ELP-Adapters: Parameter Efficient Adapter Tuning for Various Speech Processing TasksIEEE/ACM Transactions on Audio Speech and Language Processing (TASLP), 2024

247

28 Jul 2024

SLIM: Style-Linguistics Mismatch Model for Generalized Audio Deepfake Detection

400

26 Jul 2024

Overview of Speaker Modeling and Its Applications: From the Lens of Deep Speaker Representation Learning

Shuai Wang

Zheng-Shou Chen

Kong Aik Lee

Yan-min Qian

Haizhou Li

344

21 Jul 2024

Self-supervised ASR Models and Features For Dysarthric and Elderly Speech Recognition

Shujie Hu

Xurong Xie

Mengzhe Geng

Zengrui Jin

Jiajun Deng

...

Yi Wang

Mingyu Cui

Tianzi Wang

Helen Meng

Xunying Liu

248

03 Jul 2024

Target Speech Extraction with Pre-trained Self-supervised Learning Models

220

17 Feb 2024

Probing Self-supervised Learning Models with Target Speech Extraction

260

17 Feb 2024

Self-supervised Reflective Learning through Self-distillation and Online Clustering for Speaker Representation LearningIEEE Transactions on Audio, Speech, and Language Processing (IEEE TASLP), 2024

290

03 Jan 2024

Enhancing Pre-trained ASR System Fine-tuning for Dysarthric Speech Recognition using Adversarial Data AugmentationIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2024

Zengrui Jin

Shujie Hu

Tianzi Wang

163

01 Jan 2024

Advancing Audio Emotion and Intent Recognition with Large Pre-Trained Models and Bayesian InferenceACM Multimedia (ACM MM), 2023

182

16 Oct 2023

Wav2vec-based Detection and Severity Level Classification of Dysarthria from SpeechIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023

Farhad Javanmardi

Saska Tirronen

Manila Kodali

Sudarsana Reddy Kadiri

P. Alku

199

25 Sep 2023

LeBenchmark 2.0: a Standardized, Replicable and Enhanced Framework for Self-supervised Representations of French SpeechComputer Speech and Language (CSL), 2023

...

260

11 Sep 2023

Fairness and Privacy in Voice Biometrics:A Study of Gender Influences Using wav2vec 2.0Biometrics and Electronic Signatures (BES), 2023

156

27 Aug 2023

An Effective Transformer-based Contextual Model and Temporal Gate Pooling for Speaker Identification

Harunori Kawano

Sota Shimizu

146

22 Aug 2023

Speaker Recognition Using Isomorphic Graph Attention Network Based Pooling on Self-Supervised RepresentationApplied Acoustics (Appl. Acoust.), 2023

186

09 Aug 2023

Investigation of Self-supervised Pre-trained Models for Classification of Voice Quality from Speech and Neck Surface Accelerometer SignalsComputer Speech and Language (CSL), 2023

Sudarsana Reddy Kadiri

Farhad Javanmardi

P. Alku

06 Aug 2023

Toward Leveraging Pre-Trained Self-Supervised Frontends for Automatic Singing Voice Understanding Tasks: Three Case StudiesAsia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), 2023

Yuya Yamamoto

153

22 Jun 2023

Unsupervised speech intelligibility assessment with utterance level alignment distance between teacher and learner Wav2Vec-2.0 representations

Nayan Anand

Meenakshi Sirigiraju

Chiranjeevi Yarra

117

15 Jun 2023

Leveraging Semantic Information for Efficient Self-Supervised Emotion Recognition with Audio-Textual Distilled ModelsInterspeech (Interspeech), 2023

Danilo de Oliveira

N. Prabhu

Timo Gerkmann

120

30 May 2023

Spoofing Attacker Also Benefits from Self-Supervised Pretrained ModelInterspeech (Interspeech), 2023

Aoi Ito

Shota Horiguchi

SSL

134

24 May 2023

Lightweight Toxicity Detection in Spoken Language: A Transformer-based Approach for Edge Devices

Ahlam Husni Abu Nada

S. Latif

Junaid Qadir

132

22 Apr 2023

The Graph feature fusion technique for speaker recognition based on wav2vec2.0 framework

Zirui Ge

Haiyan Guo

Zhen Yang

163

19 Mar 2023

Audio-Visual Deception Detection: DOLOS Dataset and Parameter-Efficient Crossmodal LearningIEEE International Conference on Computer Vision (ICCV), 2023

Xiaobao Guo

Nithish Muthuchamy Selvaraj

09 Mar 2023

Exploring Self-supervised Pre-trained ASR Models For Dysarthric and Elderly Speech RecognitionIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023

Shujie Hu

Zengrui Jin

Jiajun Deng

142

28 Feb 2023

Towards multi-task learning of speech and speaker recognitionInterspeech (Interspeech), 2023

Nik Vaessen

David A. van Leeuwen

CVBM

160

24 Feb 2023

Speaker and Language Change Detection using Wav2vec2 and Whisper

Tijn Berns

Nik Vaessen

David A. van Leeuwen

161

18 Feb 2023

Residual Information in Deep Speaker Embedding Architectures

Adriana Stan

151

06 Feb 2023

Parameter Efficient Transfer Learning for Various Speech Processing TasksIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2022

Shinta Otake

Rei Kawakami

Nakamasa Inoue

172

06 Dec 2022

Multi-Label Training for Text-Independent Speaker Identification

Yuqi Xue

142

14 Nov 2022

Integrated Parameter-Efficient Tuning for General-Purpose Audio Models

169

04 Nov 2022

Dynamic Kernels and Channel Attention for Low Resource Speaker Verification

A. Ollerenshaw

Md. Asif Jalal

Thomas Hain

118

03 Nov 2022

Application of Knowledge Distillation to Multi-task Speech Representation LearningInterspeech (Interspeech), 2022

179

29 Oct 2022

Universal speaker recognition encoders for different speech segments durationIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2022

Sergey Novoselov

V. Volokhov

G. Lavrentyeva

28 Oct 2022

Fast Yet Effective Speech Emotion Recognition with Self-distillationIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2022

Zhao Ren

Thanh Tam Nguyen

Yi Chang

Björn W. Schuller

109

26 Oct 2022

Spectral Clustering-aware Learning of Embeddings for Speaker DiarisationIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2022

Evonne Lee

Guangzhi Sun

Chuxu Zhang

P. Woodland

167

24 Oct 2022

Large-scale learning of generalised representations for speaker recognition

Hye-jin Shim

195

20 Oct 2022

Extracting speaker and emotion information from self-supervised speech models via channel-wise correlationsSpoken Language Technology Workshop (SLT), 2022

116

15 Oct 2022

Fine-tuning Wav2vec for Vocal-burst Emotion Recognition

Soo-Huyng Kim

101

01 Oct 2022

Speech Emotion: Investigating Model Representations, Multi-Task Learning and Knowledge DistillationInterspeech (Interspeech), 2022

136

02 Jul 2022

Towards Understanding and Mitigating Audio Adversarial Examples for Speaker RecognitionIEEE Transactions on Dependable and Secure Computing (TDSC), 2022

Lingling Fan

234

07 Jun 2022

Robust Speaker Recognition with Transformers Using wav2vec 2.0

28 Mar 2022

Training speaker recognition systems with limited dataInterspeech (Interspeech), 2022

Nik Vaessen

David A. van Leeuwen

196

28 Mar 2022

Automatic speaker verification spoofing and deepfake detection using wav2vec 2.0 and data augmentationThe Speaker and Language Recognition Workshop (Odyssey), 2022

Xin Wang

338

247

24 Feb 2022