v1v2v3v4 (latest)

wav2vec: Unsupervised Pre-training for Speech Recognition

11 April 2019

Papers citing "wav2vec: Unsupervised Pre-training for Speech Recognition"

50 / 191 papers shown

Exploring Representation Learning for Small-Footprint Keyword SpottingInterspeech (Interspeech), 2022

Liyong Guo

Yujun Wang

166

20 Mar 2023

Adaptive Knowledge Distillation between Text and Speech Pre-trained ModelsIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023

Jinjie Ni

Yukun Ma

Wen Wang

Qian Chen

107

07 Mar 2023

Improving Medical Speech-to-Text Accuracy with Vision-Language Pre-training ModelIEEE journal of biomedical and health informatics (IEEE JBHI), 2023

208

27 Feb 2023

Knowledge-aware Bayesian Co-attention for Multimodal Emotion RecognitionIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023

Zihan Zhao

Yu Wang

Yanfeng Wang

254

20 Feb 2023

Imitator: Personalized Speech-driven 3D Facial AnimationIEEE International Conference on Computer Vision (ICCV), 2022

Balamurugan Thambiraja

252

30 Dec 2022

BLASER: A Text-Free Speech-to-Speech Translation Evaluation MetricAnnual Meeting of the Association for Computational Linguistics (ACL), 2022

Mingda Chen

Paul-Ambroise Duquenne

258

16 Dec 2022

MMSpeech: Multi-modal Multi-task Encoder-Decoder Pre-training for Speech RecognitionInterspeech (Interspeech), 2022

Xiaohuan Zhou

Jiaming Wang

Zeyu Cui

Shiliang Zhang

Zhijie Yan

Jingren Zhou

Chang Zhou

265

29 Nov 2022

VATLM: Visual-Audio-Text Pre-Training with Unified Masked Prediction for Speech Representation LearningIEEE transactions on multimedia (IEEE TMM), 2022

274

21 Nov 2022

Biased Self-supervised learning for ASRInterspeech (Interspeech), 2022

168

04 Nov 2022

Losses Can Be Blessings: Routing Self-Supervised Speech Representations Towards Efficient Multilingual and Multitask Speech ProcessingNeural Information Processing Systems (NeurIPS), 2022

Kaizhi Qian

378

02 Nov 2022

Neural Network based Formation of Cognitive Maps of Semantic Spaces and the Emergence of Abstract ConceptsScientific Reports (Sci Rep), 2022

211

28 Oct 2022

Simple and Effective Unsupervised Speech TranslationAnnual Meeting of the Association for Computational Linguistics (ACL), 2022

206

18 Oct 2022

CTCBERT: Advancing Hidden-unit BERT with CTC ObjectivesIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2022

283

16 Oct 2022

Individualized Conditioning and Negative Distances for Speaker SeparationInternational Conference on Machine Learning and Applications (ICMLA), 2022

163

12 Oct 2022

SpeechCLIP: Integrating Speech with Pre-Trained Vision and Language ModelSpoken Language Technology Workshop (SLT), 2022

409

03 Oct 2022

AudioGen: Textually Guided Audio GenerationInternational Conference on Learning Representations (ICLR), 2022

Devi Parikh

Yossi Adi

433

394

30 Sep 2022

Improving the Cross-Lingual Generalisation in Visual Question AnsweringAAAI Conference on Artificial Intelligence (AAAI), 2022

Farhad Nooralahzadeh

Rico Sennrich

250

07 Sep 2022

Equivariant Self-Supervision for Musical Tempo EstimationInternational Society for Music Information Retrieval Conference (ISMIR), 2022

Elio Quinton

272

03 Sep 2022

SampleMatch: Drum Sample Retrieval by Musical ContextInternational Society for Music Information Retrieval Conference (ISMIR), 2022

Stefan Lattner

162

01 Aug 2022

Domain Specific Wav2vec 2.0 Fine-tuning For The SE&R 2022 Challenge

A. I. S. Ferreira

Gustavo dos Reis Oliveira

191

29 Jul 2022

Multi-level Fusion of Wav2vec 2.0 and BERT for Multimodal Emotion RecognitionInterspeech (Interspeech), 2022

Zihan Zhao

Yanfeng Wang

Yu Wang

188

11 Jul 2022

Vers la compréhension automatique de la parole bout-en-bout à moindre effort

M. Naguib

François Portet

Marco Dinarelli

SSL

114

01 Jul 2022

Comparison of Speech Representations for the MOS Prediction System

101

28 Jun 2022

Revisiting End-to-End Speech-to-Text Translation From ScratchInternational Conference on Machine Learning (ICML), 2022

Biao Zhang

Barry Haddow

Rico Sennrich

193

09 Jun 2022

Self-supervised models of audio effectively explain human cortical responses to speechInternational Conference on Machine Learning (ICML), 2022

Aditya R. Vaidya

Shailee Jain

Alexander G. Huth

186

27 May 2022

Self-Supervised Speech Representation Learning: A ReviewIEEE Journal on Selected Topics in Signal Processing (IEEE JSTSP), 2022

Abdel-rahman Mohamed

Hung-yi Lee

Lasse Borgholt

Jakob Drachmann Havtorn

...

679

445

21 May 2022

Foundation Posteriors for Approximate Probabilistic InferenceNeural Information Processing Systems (NeurIPS), 2022

Mike Wu

Noah D. Goodman

UQCV

228

19 May 2022

Cross-modal Contrastive Learning for Speech TranslationNorth American Chapter of the Association for Computational Linguistics (NAACL), 2022

Rong Ye

Mingxuan Wang

Lei Li

SSL

251

103

05 May 2022

WaBERT: A Low-resource End-to-end Model for Spoken Language Understanding and Speech-to-BERT Alignment

172

22 Apr 2022

End-to-End Speech Translation for Code Switched SpeechFindings (Findings), 2022

243

11 Apr 2022

Self-Supervised Audio-and-Text Pre-training with Extremely Low-Resource Parallel DataAAAI Conference on Artificial Intelligence (AAAI), 2022

167

10 Apr 2022

Federated Self-supervised Speech Representations: Are We There Yet?Interspeech (Interspeech), 2022

Yan Gao

Javier Fernandez-Marques

Titouan Parcollet

Abhinav Mehrotra

Nicholas D. Lane

179

06 Apr 2022

Successes and critical failures of neural networks in capturing human-like speech recognitionNeural Networks (NN), 2022

282

06 Apr 2022

Anti-Spoofing Using Transfer Learning with Variational Information BottleneckInterspeech (Interspeech), 2022

219

04 Apr 2022

How Does Pre-trained Wav2Vec 2.0 Perform on Domain Shifted ASR? An Extensive Benchmark on Air Traffic Control CommunicationsSpoken Language Technology Workshop (SLT), 2022

265

31 Mar 2022

Recent improvements of ASR models in the face of adversarial attacksInterspeech (Interspeech), 2022

R. Olivier

Bhiksha Raj

AAML

291

29 Mar 2022

Visualizations of Complex Sequences of Family-Infant Vocalizations Using Bag-of-Audio-Words Approach Based on Wav2vec 2.0 Features

Jialu Li

M. Hasegawa-Johnson

Nancy L. McElwain

128

29 Mar 2022

Towards Inadequately Pre-trained Models in Transfer LearningIEEE International Conference on Computer Vision (ICCV), 2022

Haoyi Xiong

145

09 Mar 2022

GCNet: Graph Completion Network for Incomplete Multimodal Learning in ConversationIEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2022

264

181

04 Mar 2022

Automatic speaker verification spoofing and deepfake detection using wav2vec 2.0 and data augmentationThe Speaker and Language Recognition Workshop (Odyssey), 2022

Xin Wang

358

254

24 Feb 2022

Improving CTC-based speech recognition via knowledge transferring from pre-trained language modelsIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2022

Pengyuan Zhang

147

22 Feb 2022

Assessing the State of Self-Supervised Human Activity Recognition using WearablesProceedings of the ACM on Interactive Mobile Wearable and Ubiquitous Technologies (IMWUT), 2022

379

116

22 Feb 2022

Learning Contextually Fused Audio-visual Representations for Audio-visual Speech RecognitionInternational Conference on Information Photonics (ICIP), 2022

274

15 Feb 2022

A Generic Self-Supervised Framework of Learning Invariant Discriminative FeaturesIEEE Transactions on Neural Networks and Learning Systems (TNNLS), 2022

179

14 Feb 2022

A Practical Guide to Logical Access Voice Presentation Attack Detection

Xin Wang

Junichi Yamagishi

AAML

203

10 Jan 2022

A New Amharic Speech Emotion Dataset and Classification BenchmarkACM Transactions on Asian and Low-Resource Language Information Processing (TALLIP), 2022

103

07 Jan 2022

Learning Nigerian accent embeddings from speech: preliminary results based on SautiDB-Naija corpus

114

12 Dec 2021

Towards Learning Universal Audio Representations

...

Jean-Baptiste Alayrac

283

23 Nov 2021

SLUE: New Benchmark Tasks for Spoken Language Understanding Evaluation on Natural SpeechIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2021

283

19 Nov 2021

Recent Advances in End-to-End Automatic Speech RecognitionAPSIPA Transactions on Signal and Information Processing (TASIP), 2021

Jinyu Li

VLM

434

431

02 Nov 2021