v1v2v3v4 (latest)

wav2vec: Unsupervised Pre-training for Speech Recognition

11 April 2019

Papers citing "wav2vec: Unsupervised Pre-training for Speech Recognition"

50 / 190 papers shown

Hallo2: Long-Duration and High-Resolution Audio-Driven Portrait Image AnimationInternational Conference on Learning Representations (ICLR), 2024

Jiahao Cui

Hui Li

Yao Yao

Hao Zhu

Hanlin Shang

Kaihui Cheng

Hang Zhou

Siyu Zhu

Jingdong Wang

DiffM VGen

334

10 Oct 2024

Toward Robust Real-World Audio Deepfake Detection: Closing the Explainability Gap

Christian Schroeder de Witt

194

09 Oct 2024

InfantCryNet: A Data-driven Framework for Intelligent Analysis of Infant CriesAsian Conference on Machine Learning (ACML), 2024

Chen Jason Zhang

219

29 Sep 2024

DreamHead: Learning Spatial-Temporal Correspondence via Hierarchical Diffusion for Audio-driven Talking Head Synthesis

Fa-Ting Hong

Yunfei Liu

Yu Li

239

16 Sep 2024

Stimulus Modality Matters: Impact of Perceptual Evaluations from Different Modalities on Speech Emotion Recognition System PerformanceIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2024

410

16 Sep 2024

Exploring the Impact of Data Quantity on ASR in Extremely Low-resource Languages

298

13 Sep 2024

Layer-aware TDNN: Speaker Recognition Using Multi-Layer Features from Pre-Trained Models

Jin Sob Kim

Hyun Joon Park

Wooseok Shin

Juan Yun

Sung Won Han

SLR

454

12 Sep 2024

What is lost in Normalization? Exploring Pitfalls in Multilingual ASR Model EvaluationsConference on Empirical Methods in Natural Language Processing (EMNLP), 2024

Kavya Manohar

Leena G Pillai

236

04 Sep 2024

CyberHost: Taming Audio-driven Avatar Diffusion Model with Region Codebook Attention

553

03 Sep 2024

GSIFN: A Graph-Structured and Interlaced-Masked Multimodal Transformer-based Fusion Network for Multimodal Sentiment Analysis

Yijie Jin

209

27 Aug 2024

Speech Representation Learning Revisited: The Necessity of Separate Learnable Parameters and Robust Data Augmentation

305

20 Aug 2024

ELP-Adapters: Parameter Efficient Adapter Tuning for Various Speech Processing TasksIEEE/ACM Transactions on Audio Speech and Language Processing (TASLP), 2024

247

28 Jul 2024

Sentiment Reasoning for Healthcare

385

24 Jul 2024

EchoMimic: Lifelike Audio-Driven Portrait Animations through Editable Landmark Conditions

Jiajiong Cao

274

159

11 Jul 2024

STONE: Self-supervised Tonality Estimator

Yuexuan Kong

Vincent Lostanlen

Gabriel Meseguer-Brocal

Stella Wong

Mathieu Lagrange

Romain Hennequin

329

10 Jul 2024

Unsupervised Concept Drift Detection from Deep Learning Representations in Real-time

261

24 Jun 2024

Hallo: Hierarchical Audio-Driven Visual Synthesis for Portrait Image Animation

Qingkun Su

Ce Liu

Yao Yao

Siyu Zhu

VGen

255

166

13 Jun 2024

Guiding Frame-Level CTC Alignments Using Self-knowledge Distillation

Eungbeom Kim

Hantae Kim

Kyogu Lee

185

12 Jun 2024

MS-HuBERT: Mitigating Pre-training and Inference Mismatch in Masked Language Modelling methods for learning Speech RepresentationsInterspeech (Interspeech), 2024

305

09 Jun 2024

SSAMBA: Self-Supervised Audio Representation Learning with Mamba State Space Model

Xilin Jiang

322

20 May 2024

Active Learning with Task Adaptation Pre-training for Speech Emotion Recognition

286

01 May 2024

Mai Hoómāuna i ka Ái: Language Models Improve Automatic Speech Recognition in Hawaiian

157

03 Apr 2024

FeatUp: A Model-Agnostic Framework for Features at Any Resolution

William T. Freeman

317

15 Mar 2024

VoxGenesis: Unsupervised Discovery of Latent Speaker Manifold for Speech Synthesis

Kong Aik Lee

241

01 Mar 2024

Transcription and translation of videos using fine-tuned XLSR Wav2Vec2 on custom dataset and mBART

140

01 Mar 2024

EMO: Emote Portrait Alive -- Generating Expressive Portrait Videos with Audio2Video Diffusion Model under Weak Conditions

318

218

27 Feb 2024

Experimental Study: Enhancing Voice Spoofing Detection Models with wav2vec 2.0

247

27 Feb 2024

Learning to Generate Context-Sensitive Backchannel Smiles for Embodied AI Agents with Applications in Mental Health Dialogues

276

13 Feb 2024

It's Never Too Late: Fusing Acoustic Information into Large Language Models for Automatic Speech Recognition

Chen Chen

Ruizhe Li

Yuchen Hu

Sabato Marco Siniscalchi

Pin-Yu Chen

Ensiong Chng

Chao-Han Huck Yang

227

08 Feb 2024

Streaming Sequence Transduction through Dynamic Compression

500

02 Feb 2024

MF-AED-AEC: Speech Emotion Recognition by Leveraging Multimodal Fusion, Asr Error Detection, and Asr Error CorrectionIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2024

215

24 Jan 2024

Towards Weakly Supervised Text-to-Audio Grounding

Kai Yu

353

05 Jan 2024

USM-Lite: Quantization and Sparsity Aware Fine-tuning for Speech Recognition with Universal Speech ModelsIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023

...

472

13 Dec 2023

A-JEPA: Joint-Embedding Predictive Architecture Can Listen

Zhengcong Fei

Mingyuan Fan

Junshi Huang

377

27 Nov 2023

Improving Speech Inversion Through Self-Supervised Embeddings and Enhanced Tract VariablesEuropean Signal Processing Conference (EUSIPCO), 2023

Ahmed Adel Attia

Yashish M. Siriwardena

Carol Espy-Wilson

SSL

221

17 Sep 2023

Indonesian Automatic Speech Recognition with XLSR-53Social Science Research Network (SSRN), 2022

Panji Arisaputra

Amalia Zahra

116

20 Aug 2023

AudioLDM 2: Learning Holistic Audio Generation with Self-supervised Pretraining

Yuxuan Wang

326

378

10 Aug 2023

Vesper: A Compact and Effective Pretrained Model for Speech Emotion RecognitionIEEE Transactions on Affective Computing (IEEE Trans. Affective Comput.), 2023

298

20 Jul 2023

Multimodal Audio-textual Architecture for Robust Spoken Language Understanding

Anderson R. Avila

Mehdi Rezagholizadeh

Chao Xing

162

12 Jun 2023

PEFT-SER: On the Use of Parameter Efficient Transfer Learning Approaches For Speech Emotion Recognition Using Pre-trained Speech ModelsAffective Computing and Intelligent Interaction (ACII), 2023

Tiantian Feng

Shrikanth Narayanan

257

08 Jun 2023

HD-DEMUCS: General Speech Restoration with Heterogeneous DecodersInterspeech (Interspeech), 2023

178

02 Jun 2023

Scaling Speech Technology to 1,000+ LanguagesJournal of machine learning research (JMLR), 2023

...

Yossi Adi

389

522

22 May 2023

Duplex Diffusion Models Improve Speech-to-Speech TranslationAnnual Meeting of the Association for Computational Linguistics (ACL), 2023

Xianchao Wu

DiffM

219

22 May 2023

TrustSER: On the Trustworthiness of Fine-tuning Pre-trained Speech Embeddings For Speech Emotion RecognitionIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023

Tiantian Feng

Rajat Hebbar

Shrikanth Narayanan

167

18 May 2023

A Survey on Time-Series Pre-Trained ModelsIEEE Transactions on Knowledge and Data Engineering (TKDE), 2023

282

18 May 2023

A multimodal dynamical variational autoencoder for audiovisual speech representation learningNeural Networks (NN), 2022

Samir Sadok

Simon Leglaive

Laurent Girin

Xavier Alameda-Pineda

Renaud Séguier

356

05 May 2023

MER 2023: Multi-label Learning, Modality Robustness, and Semi-Supervised LearningACM Multimedia (ACM MM), 2023

...

Björn W. Schuller

271

18 Apr 2023

HCAM -- Hierarchical Cross Attention Model for Multi-modal Emotion Recognition

Soumya Dutta

Sriram Ganapathy

354

14 Apr 2023

Wav2code: Restore Clean Speech Representations via Codebook Lookup for Noise-Robust ASRIEEE/ACM Transactions on Audio Speech and Language Processing (TASLP), 2023

Yuchen Hu

Cheng Chen

Qiu-shi Zhu

Eng Siong Chng

298

11 Apr 2023

Transformer-based Self-supervised Multimodal Representation Learning for Wearable Emotion RecognitionIEEE Transactions on Affective Computing (IEEE Trans. Affective Comput.), 2023

Yujin Wu

Mohamed Daoudi

A. Amad

196

29 Mar 2023