ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1904.05862
  4. Cited By
wav2vec: Unsupervised Pre-training for Speech Recognition
v1v2v3v4 (latest)

wav2vec: Unsupervised Pre-training for Speech Recognition

11 April 2019
Steffen Schneider
Alexei Baevski
R. Collobert
Michael Auli
    SSL
ArXiv (abs)PDFHTML

Papers citing "wav2vec: Unsupervised Pre-training for Speech Recognition"

50 / 191 papers shown
Exploring Representation Learning for Small-Footprint Keyword Spotting
Exploring Representation Learning for Small-Footprint Keyword SpottingInterspeech (Interspeech), 2022
Fan Cui
Liyong Guo
Quandong Wang
Peng Gao
Yujun Wang
SSL
166
4
0
20 Mar 2023
Adaptive Knowledge Distillation between Text and Speech Pre-trained
  Models
Adaptive Knowledge Distillation between Text and Speech Pre-trained ModelsIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023
Jinjie Ni
Yukun Ma
Wen Wang
Qian Chen
Dianwen Ng
Han Lei
Trung Hieu Nguyen
Chong Zhang
B. Ma
Xiaoshi Zhong
107
3
0
07 Mar 2023
Improving Medical Speech-to-Text Accuracy with Vision-Language
  Pre-training Model
Improving Medical Speech-to-Text Accuracy with Vision-Language Pre-training ModelIEEE journal of biomedical and health informatics (IEEE JBHI), 2023
Jaeyoung Huh
Sangjoon Park
Jeonghyeon Lee
Jong Chul Ye
LM&MA
208
15
0
27 Feb 2023
Knowledge-aware Bayesian Co-attention for Multimodal Emotion Recognition
Knowledge-aware Bayesian Co-attention for Multimodal Emotion RecognitionIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023
Zihan Zhao
Yu Wang
Yanfeng Wang
254
21
0
20 Feb 2023
Imitator: Personalized Speech-driven 3D Facial Animation
Imitator: Personalized Speech-driven 3D Facial AnimationIEEE International Conference on Computer Vision (ICCV), 2022
Balamurugan Thambiraja
I. Habibie
S. Aliakbarian
Darren Cosker
Christian Theobalt
Justus Thies
CVBM
252
91
0
30 Dec 2022
BLASER: A Text-Free Speech-to-Speech Translation Evaluation Metric
BLASER: A Text-Free Speech-to-Speech Translation Evaluation MetricAnnual Meeting of the Association for Computational Linguistics (ACL), 2022
Mingda Chen
Paul-Ambroise Duquenne
Pierre Yves Andrews
Justine T. Kao
Alexandre Mourachko
Holger Schwenk
Marta R. Costa-jussá
258
23
0
16 Dec 2022
MMSpeech: Multi-modal Multi-task Encoder-Decoder Pre-training for Speech
  Recognition
MMSpeech: Multi-modal Multi-task Encoder-Decoder Pre-training for Speech RecognitionInterspeech (Interspeech), 2022
Xiaohuan Zhou
Jiaming Wang
Zeyu Cui
Shiliang Zhang
Zhijie Yan
Jingren Zhou
Chang Zhou
265
13
0
29 Nov 2022
VATLM: Visual-Audio-Text Pre-Training with Unified Masked Prediction for
  Speech Representation Learning
VATLM: Visual-Audio-Text Pre-Training with Unified Masked Prediction for Speech Representation LearningIEEE transactions on multimedia (IEEE TMM), 2022
Qiu-shi Zhu
Long Zhou
Zi-Hua Zhang
Shujie Liu
Binxing Jiao
Jie Zhang
Lirong Dai
Daxin Jiang
Jinyu Li
Furu Wei
274
51
0
21 Nov 2022
Biased Self-supervised learning for ASR
Biased Self-supervised learning for ASRInterspeech (Interspeech), 2022
Florian Kreyssig
Yangyang Shi
Jinxi Guo
Leda Sari
Abdel-rahman Mohamed
P. Woodland
SSL
168
4
0
04 Nov 2022
Losses Can Be Blessings: Routing Self-Supervised Speech Representations Towards Efficient Multilingual and Multitask Speech Processing
Losses Can Be Blessings: Routing Self-Supervised Speech Representations Towards Efficient Multilingual and Multitask Speech ProcessingNeural Information Processing Systems (NeurIPS), 2022
Yonggan Fu
Yang Zhang
Kaizhi Qian
Zhifan Ye
Zhongzhi Yu
Cheng-I Jeff Lai
Yingyan Lin
378
10
0
02 Nov 2022
Neural Network based Formation of Cognitive Maps of Semantic Spaces and
  the Emergence of Abstract Concepts
Neural Network based Formation of Cognitive Maps of Semantic Spaces and the Emergence of Abstract ConceptsScientific Reports (Sci Rep), 2022
Paul Stoewer
A. Schilling
Andreas K. Maier
P. Krauss
211
18
0
28 Oct 2022
Simple and Effective Unsupervised Speech Translation
Simple and Effective Unsupervised Speech TranslationAnnual Meeting of the Association for Computational Linguistics (ACL), 2022
Changhan Wang
Hirofumi Inaguma
Peng-Jen Chen
Ilia Kulikov
Yun Tang
Wei-Ning Hsu
Michael Auli
J. Pino
SSL
206
19
0
18 Oct 2022
CTCBERT: Advancing Hidden-unit BERT with CTC Objectives
CTCBERT: Advancing Hidden-unit BERT with CTC ObjectivesIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2022
Ruchao Fan
Yiming Wang
Yashesh Gaur
Jinyu Li
283
8
0
16 Oct 2022
Individualized Conditioning and Negative Distances for Speaker
  Separation
Individualized Conditioning and Negative Distances for Speaker SeparationInternational Conference on Machine Learning and Applications (ICMLA), 2022
Tao Sun
Nidal Abuhajar
Shuyu Gong
Zhewei Wang
Charles D. Smith
Xianhui Wang
Li Xu
Jundong Liu
VLM
163
1
0
12 Oct 2022
SpeechCLIP: Integrating Speech with Pre-Trained Vision and Language
  Model
SpeechCLIP: Integrating Speech with Pre-Trained Vision and Language ModelSpoken Language Technology Workshop (SLT), 2022
Yi-Jen Shih
Hsuan-Fu Wang
Heng-Jui Chang
Layne Berry
Hung-yi Lee
David Harwath
VLMCLIP
409
41
0
03 Oct 2022
AudioGen: Textually Guided Audio Generation
AudioGen: Textually Guided Audio GenerationInternational Conference on Learning Representations (ICLR), 2022
Felix Kreuk
Gabriel Synnaeve
Adam Polyak
Uriel Singer
Alexandre Défossez
Jade Copet
Devi Parikh
Yaniv Taigman
Yossi Adi
DiffM
433
394
0
30 Sep 2022
Improving the Cross-Lingual Generalisation in Visual Question Answering
Improving the Cross-Lingual Generalisation in Visual Question AnsweringAAAI Conference on Artificial Intelligence (AAAI), 2022
Farhad Nooralahzadeh
Rico Sennrich
250
8
0
07 Sep 2022
Equivariant Self-Supervision for Musical Tempo Estimation
Equivariant Self-Supervision for Musical Tempo EstimationInternational Society for Music Information Retrieval Conference (ISMIR), 2022
Elio Quinton
272
16
0
03 Sep 2022
SampleMatch: Drum Sample Retrieval by Musical Context
SampleMatch: Drum Sample Retrieval by Musical ContextInternational Society for Music Information Retrieval Conference (ISMIR), 2022
Stefan Lattner
162
12
0
01 Aug 2022
Domain Specific Wav2vec 2.0 Fine-tuning For The SE&R 2022 Challenge
Domain Specific Wav2vec 2.0 Fine-tuning For The SE&R 2022 Challenge
A. I. S. Ferreira
Gustavo dos Reis Oliveira
191
3
0
29 Jul 2022
Multi-level Fusion of Wav2vec 2.0 and BERT for Multimodal Emotion
  Recognition
Multi-level Fusion of Wav2vec 2.0 and BERT for Multimodal Emotion RecognitionInterspeech (Interspeech), 2022
Zihan Zhao
Yanfeng Wang
Yu Wang
188
43
0
11 Jul 2022
Vers la compréhension automatique de la parole bout-en-bout à
  moindre effort
Vers la compréhension automatique de la parole bout-en-bout à moindre effort
M. Naguib
François Portet
Marco Dinarelli
SSL
114
0
0
01 Jul 2022
Comparison of Speech Representations for the MOS Prediction System
Comparison of Speech Representations for the MOS Prediction System
A. Kunikoshi
Jaebok Kim
Won-Suk Jun
K. Sjölander
101
1
0
28 Jun 2022
Revisiting End-to-End Speech-to-Text Translation From Scratch
Revisiting End-to-End Speech-to-Text Translation From ScratchInternational Conference on Machine Learning (ICML), 2022
Biao Zhang
Barry Haddow
Rico Sennrich
193
45
0
09 Jun 2022
Self-supervised models of audio effectively explain human cortical
  responses to speech
Self-supervised models of audio effectively explain human cortical responses to speechInternational Conference on Machine Learning (ICML), 2022
Aditya R. Vaidya
Shailee Jain
Alexander G. Huth
186
70
0
27 May 2022
Self-Supervised Speech Representation Learning: A Review
Self-Supervised Speech Representation Learning: A ReviewIEEE Journal on Selected Topics in Signal Processing (IEEE JSTSP), 2022
Abdel-rahman Mohamed
Hung-yi Lee
Lasse Borgholt
Jakob Drachmann Havtorn
Joakim Edin
...
Shang-Wen Li
Karen Livescu
Lars Maaløe
Tara N. Sainath
Shinji Watanabe
SSLAI4TS
679
445
0
21 May 2022
Foundation Posteriors for Approximate Probabilistic Inference
Foundation Posteriors for Approximate Probabilistic InferenceNeural Information Processing Systems (NeurIPS), 2022
Mike Wu
Noah D. Goodman
UQCV
228
7
0
19 May 2022
Cross-modal Contrastive Learning for Speech Translation
Cross-modal Contrastive Learning for Speech TranslationNorth American Chapter of the Association for Computational Linguistics (NAACL), 2022
Rong Ye
Mingxuan Wang
Lei Li
SSL
251
103
0
05 May 2022
WaBERT: A Low-resource End-to-end Model for Spoken Language
  Understanding and Speech-to-BERT Alignment
WaBERT: A Low-resource End-to-end Model for Spoken Language Understanding and Speech-to-BERT Alignment
Lin Yao
Jianfei Song
Rui Xu
Yingfang Yang
Zijian Chen
Yafeng Deng
VLM
172
2
0
22 Apr 2022
End-to-End Speech Translation for Code Switched Speech
End-to-End Speech Translation for Code Switched SpeechFindings (Findings), 2022
Orion Weller
Matthias Sperber
Telmo Pires
Hendra Setiawan
Christian Gollan
Dominic Telaar
Matthias Paulik
243
35
0
11 Apr 2022
Self-Supervised Audio-and-Text Pre-training with Extremely Low-Resource
  Parallel Data
Self-Supervised Audio-and-Text Pre-training with Extremely Low-Resource Parallel DataAAAI Conference on Artificial Intelligence (AAAI), 2022
Yunxing Kang
Tianqiao Liu
Hang Li
Y. Hao
Wenbiao Ding
167
9
0
10 Apr 2022
Federated Self-supervised Speech Representations: Are We There Yet?
Federated Self-supervised Speech Representations: Are We There Yet?Interspeech (Interspeech), 2022
Yan Gao
Javier Fernandez-Marques
Titouan Parcollet
Abhinav Mehrotra
Nicholas D. Lane
179
14
0
06 Apr 2022
Successes and critical failures of neural networks in capturing
  human-like speech recognition
Successes and critical failures of neural networks in capturing human-like speech recognitionNeural Networks (NN), 2022
Federico Adolfi
J. Bowers
David Poeppel
UQCV
282
27
0
06 Apr 2022
Anti-Spoofing Using Transfer Learning with Variational Information
  Bottleneck
Anti-Spoofing Using Transfer Learning with Variational Information BottleneckInterspeech (Interspeech), 2022
Youngsik Eom
Yeonghyeon Lee
Ji Sub Um
Hoi-Rim Kim
219
29
0
04 Apr 2022
How Does Pre-trained Wav2Vec 2.0 Perform on Domain Shifted ASR? An
  Extensive Benchmark on Air Traffic Control Communications
How Does Pre-trained Wav2Vec 2.0 Perform on Domain Shifted ASR? An Extensive Benchmark on Air Traffic Control CommunicationsSpoken Language Technology Workshop (SLT), 2022
Juan Pablo Zuluaga
Amrutha Prasad
Iuliia Nigmatulina
Seyyed Saeed Sarfjoo
P. Motlícek
Matthias Kleinert
H. Helmke
Oliver Ohneiser
Qingran Zhan
265
52
0
31 Mar 2022
Recent improvements of ASR models in the face of adversarial attacks
Recent improvements of ASR models in the face of adversarial attacksInterspeech (Interspeech), 2022
R. Olivier
Bhiksha Raj
AAML
291
18
0
29 Mar 2022
Visualizations of Complex Sequences of Family-Infant Vocalizations Using
  Bag-of-Audio-Words Approach Based on Wav2vec 2.0 Features
Visualizations of Complex Sequences of Family-Infant Vocalizations Using Bag-of-Audio-Words Approach Based on Wav2vec 2.0 Features
Jialu Li
M. Hasegawa-Johnson
Nancy L. McElwain
128
1
0
29 Mar 2022
Towards Inadequately Pre-trained Models in Transfer Learning
Towards Inadequately Pre-trained Models in Transfer LearningIEEE International Conference on Computer Vision (ICCV), 2022
Andong Deng
Xingjian Li
Di Hu
Tianyang Wang
Haoyi Xiong
Chengzhong Xu
145
8
0
09 Mar 2022
GCNet: Graph Completion Network for Incomplete Multimodal Learning in
  Conversation
GCNet: Graph Completion Network for Incomplete Multimodal Learning in ConversationIEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2022
Zheng Lian
Lang Chen
Guoying Zhao
B. Liu
Jianhua Tao
264
181
0
04 Mar 2022
Automatic speaker verification spoofing and deepfake detection using
  wav2vec 2.0 and data augmentation
Automatic speaker verification spoofing and deepfake detection using wav2vec 2.0 and data augmentationThe Speaker and Language Recognition Workshop (Odyssey), 2022
Hemlata Tak
Massimiliano Todisco
Xin Wang
Jee-weon Jung
Junichi Yamagishi
Nicholas W. D. Evans
358
254
0
24 Feb 2022
Improving CTC-based speech recognition via knowledge transferring from
  pre-trained language models
Improving CTC-based speech recognition via knowledge transferring from pre-trained language modelsIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2022
Keqi Deng
Songjun Cao
Yike Zhang
Long Ma
Gaofeng Cheng
Ji Xu
Pengyuan Zhang
147
32
0
22 Feb 2022
Assessing the State of Self-Supervised Human Activity Recognition using
  Wearables
Assessing the State of Self-Supervised Human Activity Recognition using WearablesProceedings of the ACM on Interactive Mobile Wearable and Ubiquitous Technologies (IMWUT), 2022
H. Haresamudram
Irfan Essa
Thomas Plötz
SSL
379
116
0
22 Feb 2022
Learning Contextually Fused Audio-visual Representations for
  Audio-visual Speech Recognition
Learning Contextually Fused Audio-visual Representations for Audio-visual Speech RecognitionInternational Conference on Information Photonics (ICIP), 2022
Zitian Zhang
Jie Zhang
Jian-Shu Zhang
Ming Wu
Xin Fang
Lirong Dai
SSL
274
12
0
15 Feb 2022
A Generic Self-Supervised Framework of Learning Invariant Discriminative
  Features
A Generic Self-Supervised Framework of Learning Invariant Discriminative FeaturesIEEE Transactions on Neural Networks and Learning Systems (TNNLS), 2022
Foivos Ntelemis
Yaochu Jin
S. Thomas
OOD
179
5
0
14 Feb 2022
A Practical Guide to Logical Access Voice Presentation Attack Detection
A Practical Guide to Logical Access Voice Presentation Attack Detection
Xin Wang
Junichi Yamagishi
AAML
203
14
0
10 Jan 2022
A New Amharic Speech Emotion Dataset and Classification Benchmark
A New Amharic Speech Emotion Dataset and Classification BenchmarkACM Transactions on Asian and Low-Resource Language Information Processing (TALLIP), 2022
E. A. Retta
Eiad Almekhlafi
R. Sutcliffe
Mustafa Mhamed
Haider Ali
Junlong Feng
103
18
0
07 Jan 2022
Learning Nigerian accent embeddings from speech: preliminary results
  based on SautiDB-Naija corpus
Learning Nigerian accent embeddings from speech: preliminary results based on SautiDB-Naija corpus
Tejumade Afonja
Oladimeji Mudele
Iroro Orife
Kenechi Dukor
Lawrence Francis
Duru Goodness
Oluwafemi Azeez
Ademola Malomo
Clinton Mbataku
114
4
0
12 Dec 2021
Towards Learning Universal Audio Representations
Towards Learning Universal Audio Representations
Luyu Wang
Pauline Luc
Yan Wu
Adrià Recasens
Lucas Smaira
...
Andrew Jaegle
Jean-Baptiste Alayrac
Sander Dieleman
João Carreira
Aaron van den Oord
SSL
283
77
0
23 Nov 2021
SLUE: New Benchmark Tasks for Spoken Language Understanding Evaluation
  on Natural Speech
SLUE: New Benchmark Tasks for Spoken Language Understanding Evaluation on Natural SpeechIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2021
Suwon Shon
Ankita Pasad
Felix Wu
Pablo Brusco
Yoav Artzi
Karen Livescu
Kyu Jeong Han
AuLLMELM
283
90
0
19 Nov 2021
Recent Advances in End-to-End Automatic Speech Recognition
Recent Advances in End-to-End Automatic Speech RecognitionAPSIPA Transactions on Signal and Information Processing (TASIP), 2021
Jinyu Li
VLM
434
431
0
02 Nov 2021
Previous
1234
Next
Page 3 of 4