Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
1904.05862
Cited By
v1
v2
v3
v4 (latest)
wav2vec: Unsupervised Pre-training for Speech Recognition
11 April 2019
Steffen Schneider
Alexei Baevski
R. Collobert
Michael Auli
SSL
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"wav2vec: Unsupervised Pre-training for Speech Recognition"
50 / 191 papers shown
Exploring Representation Learning for Small-Footprint Keyword Spotting
Interspeech (Interspeech), 2022
Fan Cui
Liyong Guo
Quandong Wang
Peng Gao
Yujun Wang
SSL
166
4
0
20 Mar 2023
Adaptive Knowledge Distillation between Text and Speech Pre-trained Models
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023
Jinjie Ni
Yukun Ma
Wen Wang
Qian Chen
Dianwen Ng
Han Lei
Trung Hieu Nguyen
Chong Zhang
B. Ma
Xiaoshi Zhong
107
3
0
07 Mar 2023
Improving Medical Speech-to-Text Accuracy with Vision-Language Pre-training Model
IEEE journal of biomedical and health informatics (IEEE JBHI), 2023
Jaeyoung Huh
Sangjoon Park
Jeonghyeon Lee
Jong Chul Ye
LM&MA
208
15
0
27 Feb 2023
Knowledge-aware Bayesian Co-attention for Multimodal Emotion Recognition
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023
Zihan Zhao
Yu Wang
Yanfeng Wang
254
21
0
20 Feb 2023
Imitator: Personalized Speech-driven 3D Facial Animation
IEEE International Conference on Computer Vision (ICCV), 2022
Balamurugan Thambiraja
I. Habibie
S. Aliakbarian
Darren Cosker
Christian Theobalt
Justus Thies
CVBM
252
91
0
30 Dec 2022
BLASER: A Text-Free Speech-to-Speech Translation Evaluation Metric
Annual Meeting of the Association for Computational Linguistics (ACL), 2022
Mingda Chen
Paul-Ambroise Duquenne
Pierre Yves Andrews
Justine T. Kao
Alexandre Mourachko
Holger Schwenk
Marta R. Costa-jussá
258
23
0
16 Dec 2022
MMSpeech: Multi-modal Multi-task Encoder-Decoder Pre-training for Speech Recognition
Interspeech (Interspeech), 2022
Xiaohuan Zhou
Jiaming Wang
Zeyu Cui
Shiliang Zhang
Zhijie Yan
Jingren Zhou
Chang Zhou
265
13
0
29 Nov 2022
VATLM: Visual-Audio-Text Pre-Training with Unified Masked Prediction for Speech Representation Learning
IEEE transactions on multimedia (IEEE TMM), 2022
Qiu-shi Zhu
Long Zhou
Zi-Hua Zhang
Shujie Liu
Binxing Jiao
Jie Zhang
Lirong Dai
Daxin Jiang
Jinyu Li
Furu Wei
274
51
0
21 Nov 2022
Biased Self-supervised learning for ASR
Interspeech (Interspeech), 2022
Florian Kreyssig
Yangyang Shi
Jinxi Guo
Leda Sari
Abdel-rahman Mohamed
P. Woodland
SSL
168
4
0
04 Nov 2022
Losses Can Be Blessings: Routing Self-Supervised Speech Representations Towards Efficient Multilingual and Multitask Speech Processing
Neural Information Processing Systems (NeurIPS), 2022
Yonggan Fu
Yang Zhang
Kaizhi Qian
Zhifan Ye
Zhongzhi Yu
Cheng-I Jeff Lai
Yingyan Lin
378
10
0
02 Nov 2022
Neural Network based Formation of Cognitive Maps of Semantic Spaces and the Emergence of Abstract Concepts
Scientific Reports (Sci Rep), 2022
Paul Stoewer
A. Schilling
Andreas K. Maier
P. Krauss
211
18
0
28 Oct 2022
Simple and Effective Unsupervised Speech Translation
Annual Meeting of the Association for Computational Linguistics (ACL), 2022
Changhan Wang
Hirofumi Inaguma
Peng-Jen Chen
Ilia Kulikov
Yun Tang
Wei-Ning Hsu
Michael Auli
J. Pino
SSL
206
19
0
18 Oct 2022
CTCBERT: Advancing Hidden-unit BERT with CTC Objectives
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2022
Ruchao Fan
Yiming Wang
Yashesh Gaur
Jinyu Li
283
8
0
16 Oct 2022
Individualized Conditioning and Negative Distances for Speaker Separation
International Conference on Machine Learning and Applications (ICMLA), 2022
Tao Sun
Nidal Abuhajar
Shuyu Gong
Zhewei Wang
Charles D. Smith
Xianhui Wang
Li Xu
Jundong Liu
VLM
163
1
0
12 Oct 2022
SpeechCLIP: Integrating Speech with Pre-Trained Vision and Language Model
Spoken Language Technology Workshop (SLT), 2022
Yi-Jen Shih
Hsuan-Fu Wang
Heng-Jui Chang
Layne Berry
Hung-yi Lee
David Harwath
VLM
CLIP
409
41
0
03 Oct 2022
AudioGen: Textually Guided Audio Generation
International Conference on Learning Representations (ICLR), 2022
Felix Kreuk
Gabriel Synnaeve
Adam Polyak
Uriel Singer
Alexandre Défossez
Jade Copet
Devi Parikh
Yaniv Taigman
Yossi Adi
DiffM
433
394
0
30 Sep 2022
Improving the Cross-Lingual Generalisation in Visual Question Answering
AAAI Conference on Artificial Intelligence (AAAI), 2022
Farhad Nooralahzadeh
Rico Sennrich
250
8
0
07 Sep 2022
Equivariant Self-Supervision for Musical Tempo Estimation
International Society for Music Information Retrieval Conference (ISMIR), 2022
Elio Quinton
272
16
0
03 Sep 2022
SampleMatch: Drum Sample Retrieval by Musical Context
International Society for Music Information Retrieval Conference (ISMIR), 2022
Stefan Lattner
162
12
0
01 Aug 2022
Domain Specific Wav2vec 2.0 Fine-tuning For The SE&R 2022 Challenge
A. I. S. Ferreira
Gustavo dos Reis Oliveira
191
3
0
29 Jul 2022
Multi-level Fusion of Wav2vec 2.0 and BERT for Multimodal Emotion Recognition
Interspeech (Interspeech), 2022
Zihan Zhao
Yanfeng Wang
Yu Wang
188
43
0
11 Jul 2022
Vers la compréhension automatique de la parole bout-en-bout à moindre effort
M. Naguib
François Portet
Marco Dinarelli
SSL
114
0
0
01 Jul 2022
Comparison of Speech Representations for the MOS Prediction System
A. Kunikoshi
Jaebok Kim
Won-Suk Jun
K. Sjölander
101
1
0
28 Jun 2022
Revisiting End-to-End Speech-to-Text Translation From Scratch
International Conference on Machine Learning (ICML), 2022
Biao Zhang
Barry Haddow
Rico Sennrich
193
45
0
09 Jun 2022
Self-supervised models of audio effectively explain human cortical responses to speech
International Conference on Machine Learning (ICML), 2022
Aditya R. Vaidya
Shailee Jain
Alexander G. Huth
186
70
0
27 May 2022
Self-Supervised Speech Representation Learning: A Review
IEEE Journal on Selected Topics in Signal Processing (IEEE JSTSP), 2022
Abdel-rahman Mohamed
Hung-yi Lee
Lasse Borgholt
Jakob Drachmann Havtorn
Joakim Edin
...
Shang-Wen Li
Karen Livescu
Lars Maaløe
Tara N. Sainath
Shinji Watanabe
SSL
AI4TS
679
445
0
21 May 2022
Foundation Posteriors for Approximate Probabilistic Inference
Neural Information Processing Systems (NeurIPS), 2022
Mike Wu
Noah D. Goodman
UQCV
228
7
0
19 May 2022
Cross-modal Contrastive Learning for Speech Translation
North American Chapter of the Association for Computational Linguistics (NAACL), 2022
Rong Ye
Mingxuan Wang
Lei Li
SSL
251
103
0
05 May 2022
WaBERT: A Low-resource End-to-end Model for Spoken Language Understanding and Speech-to-BERT Alignment
Lin Yao
Jianfei Song
Rui Xu
Yingfang Yang
Zijian Chen
Yafeng Deng
VLM
172
2
0
22 Apr 2022
End-to-End Speech Translation for Code Switched Speech
Findings (Findings), 2022
Orion Weller
Matthias Sperber
Telmo Pires
Hendra Setiawan
Christian Gollan
Dominic Telaar
Matthias Paulik
243
35
0
11 Apr 2022
Self-Supervised Audio-and-Text Pre-training with Extremely Low-Resource Parallel Data
AAAI Conference on Artificial Intelligence (AAAI), 2022
Yunxing Kang
Tianqiao Liu
Hang Li
Y. Hao
Wenbiao Ding
167
9
0
10 Apr 2022
Federated Self-supervised Speech Representations: Are We There Yet?
Interspeech (Interspeech), 2022
Yan Gao
Javier Fernandez-Marques
Titouan Parcollet
Abhinav Mehrotra
Nicholas D. Lane
179
14
0
06 Apr 2022
Successes and critical failures of neural networks in capturing human-like speech recognition
Neural Networks (NN), 2022
Federico Adolfi
J. Bowers
David Poeppel
UQCV
282
27
0
06 Apr 2022
Anti-Spoofing Using Transfer Learning with Variational Information Bottleneck
Interspeech (Interspeech), 2022
Youngsik Eom
Yeonghyeon Lee
Ji Sub Um
Hoi-Rim Kim
219
29
0
04 Apr 2022
How Does Pre-trained Wav2Vec 2.0 Perform on Domain Shifted ASR? An Extensive Benchmark on Air Traffic Control Communications
Spoken Language Technology Workshop (SLT), 2022
Juan Pablo Zuluaga
Amrutha Prasad
Iuliia Nigmatulina
Seyyed Saeed Sarfjoo
P. Motlícek
Matthias Kleinert
H. Helmke
Oliver Ohneiser
Qingran Zhan
265
52
0
31 Mar 2022
Recent improvements of ASR models in the face of adversarial attacks
Interspeech (Interspeech), 2022
R. Olivier
Bhiksha Raj
AAML
291
18
0
29 Mar 2022
Visualizations of Complex Sequences of Family-Infant Vocalizations Using Bag-of-Audio-Words Approach Based on Wav2vec 2.0 Features
Jialu Li
M. Hasegawa-Johnson
Nancy L. McElwain
128
1
0
29 Mar 2022
Towards Inadequately Pre-trained Models in Transfer Learning
IEEE International Conference on Computer Vision (ICCV), 2022
Andong Deng
Xingjian Li
Di Hu
Tianyang Wang
Haoyi Xiong
Chengzhong Xu
145
8
0
09 Mar 2022
GCNet: Graph Completion Network for Incomplete Multimodal Learning in Conversation
IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2022
Zheng Lian
Lang Chen
Guoying Zhao
B. Liu
Jianhua Tao
264
181
0
04 Mar 2022
Automatic speaker verification spoofing and deepfake detection using wav2vec 2.0 and data augmentation
The Speaker and Language Recognition Workshop (Odyssey), 2022
Hemlata Tak
Massimiliano Todisco
Xin Wang
Jee-weon Jung
Junichi Yamagishi
Nicholas W. D. Evans
358
254
0
24 Feb 2022
Improving CTC-based speech recognition via knowledge transferring from pre-trained language models
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2022
Keqi Deng
Songjun Cao
Yike Zhang
Long Ma
Gaofeng Cheng
Ji Xu
Pengyuan Zhang
147
32
0
22 Feb 2022
Assessing the State of Self-Supervised Human Activity Recognition using Wearables
Proceedings of the ACM on Interactive Mobile Wearable and Ubiquitous Technologies (IMWUT), 2022
H. Haresamudram
Irfan Essa
Thomas Plötz
SSL
379
116
0
22 Feb 2022
Learning Contextually Fused Audio-visual Representations for Audio-visual Speech Recognition
International Conference on Information Photonics (ICIP), 2022
Zitian Zhang
Jie Zhang
Jian-Shu Zhang
Ming Wu
Xin Fang
Lirong Dai
SSL
274
12
0
15 Feb 2022
A Generic Self-Supervised Framework of Learning Invariant Discriminative Features
IEEE Transactions on Neural Networks and Learning Systems (TNNLS), 2022
Foivos Ntelemis
Yaochu Jin
S. Thomas
OOD
179
5
0
14 Feb 2022
A Practical Guide to Logical Access Voice Presentation Attack Detection
Xin Wang
Junichi Yamagishi
AAML
203
14
0
10 Jan 2022
A New Amharic Speech Emotion Dataset and Classification Benchmark
ACM Transactions on Asian and Low-Resource Language Information Processing (TALLIP), 2022
E. A. Retta
Eiad Almekhlafi
R. Sutcliffe
Mustafa Mhamed
Haider Ali
Junlong Feng
103
18
0
07 Jan 2022
Learning Nigerian accent embeddings from speech: preliminary results based on SautiDB-Naija corpus
Tejumade Afonja
Oladimeji Mudele
Iroro Orife
Kenechi Dukor
Lawrence Francis
Duru Goodness
Oluwafemi Azeez
Ademola Malomo
Clinton Mbataku
114
4
0
12 Dec 2021
Towards Learning Universal Audio Representations
Luyu Wang
Pauline Luc
Yan Wu
Adrià Recasens
Lucas Smaira
...
Andrew Jaegle
Jean-Baptiste Alayrac
Sander Dieleman
João Carreira
Aaron van den Oord
SSL
283
77
0
23 Nov 2021
SLUE: New Benchmark Tasks for Spoken Language Understanding Evaluation on Natural Speech
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2021
Suwon Shon
Ankita Pasad
Felix Wu
Pablo Brusco
Yoav Artzi
Karen Livescu
Kyu Jeong Han
AuLLM
ELM
283
90
0
19 Nov 2021
Recent Advances in End-to-End Automatic Speech Recognition
APSIPA Transactions on Signal and Information Processing (TASIP), 2021
Jinyu Li
VLM
434
431
0
02 Nov 2021
Previous
1
2
3
4
Next
Page 3 of 4