v1v2v3 (latest)

Improving Transformer-based Speech Recognition Using Unsupervised Pre-training

22 October 2019

Wei Zou

Xiangang Li

ArXiv (abs)PDF HTML

Papers citing "Improving Transformer-based Speech Recognition Using Unsupervised Pre-training"

50 / 56 papers shown

Title
Contrastive Augmentation: An Unsupervised Learning Approach for Keyword Spotting in Speech Technology Weinan Dai Yifeng Jiang Yuanjing Liu Jinkun Chen Xin Sun Jinglei Tao SSL 132 1 0 31 Aug 2024
Homophone Disambiguation Reveals Patterns of Context Mixing in Speech Transformers Hosein Mohebbi Grzegorz Chrupała Willem H. Zuidema Afra Alishahi 144 19 0 15 Oct 2023
Indonesian Automatic Speech Recognition with XLSR-53Social Science Research Network (SSRN), 2022 Panji Arisaputra Amalia Zahra 100 10 0 20 Aug 2023
On-Device Constrained Self-Supervised Speech Representation Learning for Keyword Spotting via Knowledge DistillationInterspeech (Interspeech), 2023 Gene-Ping Yang Yue Gu Qingming Tang Dongsu Du Yuzong Liu 145 6 0 06 Jul 2023
SAR: Self-Supervised Anti-Distortion Representation for End-To-End Speech ModelIEEE International Joint Conference on Neural Network (IJCNN), 2023 Jianzong Wang Xulong Zhang Haobin Tang Aolan Sun Ning Cheng Jing Xiao 188 1 0 23 Apr 2023
Wav2code: Restore Clean Speech Representations via Codebook Lookup for Noise-Robust ASRIEEE/ACM Transactions on Audio Speech and Language Processing (TASLP), 2023 Yuchen Hu Cheng Chen Qiu-shi Zhu Eng Siong Chng 257 17 0 11 Apr 2023
Self-supervised speech representation learning for keyword-spotting with light-weight transformersIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023 Chenyang Gao Yue Gu Francesco Calivá Yuzong Liu OffRL 140 6 0 07 Mar 2023
Dual Learning for Large Vocabulary On-Device ASRSpoken Language Technology Workshop (SLT), 2023 Cal Peyser Ronny Huang Tara N. Sainath Rohit Prabhavalkar M. Picheny K. Cho SSL 129 1 0 11 Jan 2023
PulseImpute: A Novel Benchmark Task for Pulsative Physiological Signal ImputationNeural Information Processing Systems (NeurIPS), 2022 Maxwell A. Xu Alexander Moreno Supriya Nagesh V. Aydemir D. Wetter Santosh Kumar James M. Rehg AI4TS 137 10 0 14 Dec 2022
MelHuBERT: A simplified HuBERT on Mel spectrogramsAutomatic Speech Recognition & Understanding (ASRU), 2022 Tzu-Quan Lin Hung-yi Lee Hao Tang SSL 196 18 0 17 Nov 2022
Improving Speech Representation Learning via Speech-level and Phoneme-level Masking ApproachInternational Conference on Mobile Ad-hoc and Sensor Networks (MSN), 2022 Xulong Zhang Jianzong Wang Ning Cheng Kexin Zhu Jing Xiao 118 1 0 25 Oct 2022
SUPERB @ SLT 2022: Challenge on Generalization and Efficiency of Self-Supervised Speech Representation LearningSpoken Language Technology Workshop (SLT), 2022 Tzu-hsun Feng Annie Dong Ching-Feng Yeh Shu-Wen Yang Tzu-Quan Lin ... Xuankai Chang Shinji Watanabe Abdel-rahman Mohamed Shang-Wen Li Hung-yi Lee ELM SSL 196 38 0 16 Oct 2022
Transfer Learning of wav2vec 2.0 for Automatic Lyric TranscriptionInternational Society for Music Information Retrieval Conference (ISMIR), 2022 Longshen Ou Xiangming Gu Ye Wang 157 24 0 20 Jul 2022
MET: Masked Encoding for Tabular Data Kushal Majmundar Sachin Goyal Praneeth Netrapalli Prateek Jain LMTD 110 0 0 17 Jun 2022
Speaker Identification using Speech Recognition Syeda Rabia Arshad Syed Mujtaba Haider Abdul Basit Mughal 96 1 0 29 May 2022
Speech Augmentation Based Unsupervised Learning for Keyword SpottingIEEE International Joint Conference on Neural Network (IJCNN), 2022 Jian Luo Jianzong Wang Ning Cheng Haobin Tang Jing Xiao SSL 136 2 0 28 May 2022
Adaptive multilingual speech recognition with pretrained modelsInterspeech (Interspeech), 2022 Ngoc-Quan Pham A. Waibel Jan Niehues VLM 141 24 0 24 May 2022
Self-Supervised Speech Representation Learning: A ReviewIEEE Journal on Selected Topics in Signal Processing (IEEE JSTSP), 2022 Abdel-rahman Mohamed Hung-yi Lee Lasse Borgholt Jakob Drachmann Havtorn Joakim Edin ... Shang-Wen Li Karen Livescu Lars Maaløe Tara N. Sainath Shinji Watanabe SSL AI4TS 574 433 0 21 May 2022
Audio Self-supervised Learning: A SurveyPatterns (Patterns), 2022 Shuo Liu Adria Mallol-Ragolta Emilia Parada-Cabeleiro Kun Qian Xingshuo Jing Alexander Kathan Bin Hu Bjoern W. Schuller SSL 206 125 0 02 Mar 2022
A Brief Overview of Unsupervised Neural Speech Representation Learning Lasse Borgholt Jakob Drachmann Havtorn Joakim Edin Lars Maaløe Christian Igel BDL AI4TS SSL 207 13 0 01 Mar 2022
A Noise-Robust Self-supervised Pre-training Model Based Speech Representation Learning for Automatic Speech RecognitionIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2022 Qiu-shi Zhu Jie Zhang Zi-qiang Zhang Ming Wu Xin Fang Lirong Dai 261 51 0 22 Jan 2022
Lacuna Reconstruction: Self-supervised Pre-training for Low-Resource Historical Document Transcription Nikolai Vogler J. Allen M. Miller Taylor Berg-Kirkpatrick 89 5 0 16 Dec 2021
SLUE: New Benchmark Tasks for Spoken Language Understanding Evaluation on Natural SpeechIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2021 Suwon Shon Ankita Pasad Felix Wu Pablo Brusco Yoav Artzi Karen Livescu Kyu Jeong Han AuLLM ELM 219 90 0 19 Nov 2021
SCaLa: Supervised Contrastive Learning for End-to-End Speech RecognitionInterspeech (Interspeech), 2021 Li Fu Xiaoxiao Li Runyu Wang Lu Fan Zhengchen Zhang Meng Chen Youzheng Wu Xiaodong He SSL 140 3 0 08 Oct 2021
CTAL: Pre-training Cross-modal Transformer for Audio-and-Language RepresentationsConference on Empirical Methods in Natural Language Processing (EMNLP), 2021 Hang Li Yunxing Kang Tianqiao Liu Wenbiao Ding Zitao Liu 142 20 0 01 Sep 2021
CLSRIL-23: Cross Lingual Speech Representations for Indic Languages Anirudh Gupta Harveen Singh Chadha Priyanshi Shah Neeraj Chimmwal Ankur Dhuriya Rishabh Gaur Vivek Raghavan 122 41 0 15 Jul 2021
Dropout Regularization for Self-Supervised Learning of Transformer Encoder Speech RepresentationInterspeech (Interspeech), 2021 Jian Luo Jianzong Wang Ning Cheng Jing Xiao SSL 135 6 0 09 Jul 2021
Low Resource German ASR with Untranscribed Data Spoken by Non-native Children -- INTERSPEECH 2021 Shared Task SPAPL SystemInterspeech (Interspeech), 2021 Jinhan Wang Yunzheng Zhu Ruchao Fan Wei Chu Abeer Alwan 90 8 0 18 Jun 2021
Speech BERT Embedding For Improving Prosody in Neural TTSIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2021 Liping Chen Yan Deng Xi Wang Frank Soong Lei He 185 25 0 08 Jun 2021
Unsupervised Speech RecognitionNeural Information Processing Systems (NeurIPS), 2021 Alexei Baevski Wei-Ning Hsu Alexis Conneau Michael Auli SSL 337 292 0 24 May 2021
Improving speech recognition models with small samples for air traffic control systemsNeurocomputing (Neurocomputing), 2021 Yi Lin Qin Li Bo Yang Zhen Yan Huachun Tan Zhengmao Chen 166 33 0 16 Feb 2021
Bi-APC: Bidirectional Autoregressive Predictive Coding for Unsupervised Pre-training and Its Application to Children's ASRIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2021 Ruchao Fan Amber Afshan Abeer Alwan 124 14 0 12 Feb 2021
Efficiently Fusing Pretrained Acoustic and Linguistic Encoders for Low-resource Speech RecognitionIEEE Signal Processing Letters (IEEE SPL), 2021 Cheng Yi Shiyu Zhou Bo Xu 152 44 0 17 Jan 2021
Applying Wav2vec2.0 to Speech Recognition in Various Low-resource Languages Cheng Yi Jianzhong Wang Ning Cheng Shiyu Zhou Bo Xu SSL VLM 147 87 0 22 Dec 2020
Sequence-to-Sequence Contrastive Learning for Text RecognitionComputer Vision and Pattern Recognition (CVPR), 2020 Aviad Aberdam Ron Litman Shahar Tsiper Oron Anschel Ron Slossberg Shai Mazor R. Manmatha Pietro Perona 202 121 0 20 Dec 2020
DeCoAR 2.0: Deep Contextualized Acoustic Representations with Vector Quantization Shaoshi Ling Yuzong Liu 140 112 0 11 Dec 2020
Exploring wav2vec 2.0 on speaker verification and language identificationInterspeech (Interspeech), 2020 Zhiyun Fan Meng Li Shiyu Zhou Bo Xu 224 223 0 11 Dec 2020
Towards Semi-Supervised Semantics Understanding from Speech Cheng-I Jeff Lai Jin Cao S. Bodapati Shang-Wen Li SSL 165 7 0 11 Nov 2020
Learning Contextual Tag Embeddings for Cross-Modal Alignment of Audio and TagsIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2020 Xavier Favory Konstantinos Drossos Maria Sandsten Xavier Serra 195 16 0 27 Oct 2020
Speech SIMCLR: Combining Contrastive and Reconstruction Objective for Self-supervised Speech Representation LearningInterspeech (Interspeech), 2020 Dongwei Jiang Wubo Li Miao Cao Wei Zou Xiangang Li SSL 254 71 0 27 Oct 2020
Similarity Analysis of Self-Supervised Speech Representations Yu-An Chung Yonatan Belinkov James R. Glass SSL 330 44 0 22 Oct 2020
Self-training and Pre-training are Complementary for Speech Recognition Qiantong Xu Alexei Baevski Tatiana Likhomanenko Paden Tomasello Alexis Conneau R. Collobert Gabriel Synnaeve Michael Auli SSL VLM 243 176 0 22 Oct 2020
A Transformer Based Pitch Sequence Autoencoder with MIDI Augmentation Mingshuo Ding Yi Ma 123 1 0 15 Oct 2020
Representation Learning for Sequence Data with Deep Autoencoding Predictive Components Junwen Bai Weiran Wang Yingbo Zhou Caiming Xiong SSL AI4TS 169 12 0 07 Oct 2020
Transformer with Bidirectional Decoder for Speech RecognitionInterspeech (Interspeech), 2020 Xi Chen Songyang Zhang Dandan Song P. Ouyang Shouyi Yin 111 15 0 11 Aug 2020
Transformer based unsupervised pre-training for acoustic representation learningIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2020 Ruixiong Zhang Haiwei Wu Wubo Li Dongwei Jiang Wei Zou Xiangang Li SSL ViT 223 30 0 29 Jul 2020
TERA: Self-Supervised Learning of Transformer Encoder Representation for SpeechIEEE/ACM Transactions on Audio Speech and Language Processing (TASLP), 2020 Andy T. Liu Shang-Wen Li Hung-yi Lee SSL 484 389 0 12 Jul 2020
Unsupervised Cross-lingual Representation Learning for Speech RecognitionInterspeech (Interspeech), 2020 Alexis Conneau Alexei Baevski R. Collobert Abdel-rahman Mohamed Michael Auli SSL 301 900 0 24 Jun 2020
Embodied Self-supervised Learning by Coordinated Sampling and Training Yifan Sun Xihong Wu SSL 116 9 0 20 Jun 2020
wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations Alexei Baevski Henry Zhou Abdel-rahman Mohamed Michael Auli SSL 1.1K 7,195 0 20 Jun 2020