ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1910.09932
  4. Cited By
Improving Transformer-based Speech Recognition Using Unsupervised
  Pre-training
v1v2v3 (latest)

Improving Transformer-based Speech Recognition Using Unsupervised Pre-training

22 October 2019
Dongwei Jiang
Xiaoning Lei
Wubo Li
Ne Luo
Yuxuan Hu
Wei Zou
Xiangang Li
ArXiv (abs)PDFHTML

Papers citing "Improving Transformer-based Speech Recognition Using Unsupervised Pre-training"

50 / 56 papers shown
Title
Contrastive Augmentation: An Unsupervised Learning Approach for Keyword
  Spotting in Speech Technology
Contrastive Augmentation: An Unsupervised Learning Approach for Keyword Spotting in Speech Technology
Weinan Dai
Yifeng Jiang
Yuanjing Liu
Jinkun Chen
Xin Sun
Jinglei Tao
SSL
132
1
0
31 Aug 2024
Homophone Disambiguation Reveals Patterns of Context Mixing in Speech
  Transformers
Homophone Disambiguation Reveals Patterns of Context Mixing in Speech Transformers
Hosein Mohebbi
Grzegorz Chrupała
Willem H. Zuidema
Afra Alishahi
144
19
0
15 Oct 2023
Indonesian Automatic Speech Recognition with XLSR-53
Indonesian Automatic Speech Recognition with XLSR-53Social Science Research Network (SSRN), 2022
Panji Arisaputra
Amalia Zahra
100
10
0
20 Aug 2023
On-Device Constrained Self-Supervised Speech Representation Learning for
  Keyword Spotting via Knowledge Distillation
On-Device Constrained Self-Supervised Speech Representation Learning for Keyword Spotting via Knowledge DistillationInterspeech (Interspeech), 2023
Gene-Ping Yang
Yue Gu
Qingming Tang
Dongsu Du
Yuzong Liu
145
6
0
06 Jul 2023
SAR: Self-Supervised Anti-Distortion Representation for End-To-End
  Speech Model
SAR: Self-Supervised Anti-Distortion Representation for End-To-End Speech ModelIEEE International Joint Conference on Neural Network (IJCNN), 2023
Jianzong Wang
Xulong Zhang
Haobin Tang
Aolan Sun
Ning Cheng
Jing Xiao
188
1
0
23 Apr 2023
Wav2code: Restore Clean Speech Representations via Codebook Lookup for
  Noise-Robust ASR
Wav2code: Restore Clean Speech Representations via Codebook Lookup for Noise-Robust ASRIEEE/ACM Transactions on Audio Speech and Language Processing (TASLP), 2023
Yuchen Hu
Cheng Chen
Qiu-shi Zhu
Eng Siong Chng
257
17
0
11 Apr 2023
Self-supervised speech representation learning for keyword-spotting with
  light-weight transformers
Self-supervised speech representation learning for keyword-spotting with light-weight transformersIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023
Chenyang Gao
Yue Gu
Francesco Calivá
Yuzong Liu
OffRL
140
6
0
07 Mar 2023
Dual Learning for Large Vocabulary On-Device ASR
Dual Learning for Large Vocabulary On-Device ASRSpoken Language Technology Workshop (SLT), 2023
Cal Peyser
Ronny Huang
Tara N. Sainath
Rohit Prabhavalkar
M. Picheny
K. Cho
SSL
129
1
0
11 Jan 2023
PulseImpute: A Novel Benchmark Task for Pulsative Physiological Signal
  Imputation
PulseImpute: A Novel Benchmark Task for Pulsative Physiological Signal ImputationNeural Information Processing Systems (NeurIPS), 2022
Maxwell A. Xu
Alexander Moreno
Supriya Nagesh
V. Aydemir
D. Wetter
Santosh Kumar
James M. Rehg
AI4TS
137
10
0
14 Dec 2022
MelHuBERT: A simplified HuBERT on Mel spectrograms
MelHuBERT: A simplified HuBERT on Mel spectrogramsAutomatic Speech Recognition & Understanding (ASRU), 2022
Tzu-Quan Lin
Hung-yi Lee
Hao Tang
SSL
196
18
0
17 Nov 2022
Improving Speech Representation Learning via Speech-level and
  Phoneme-level Masking Approach
Improving Speech Representation Learning via Speech-level and Phoneme-level Masking ApproachInternational Conference on Mobile Ad-hoc and Sensor Networks (MSN), 2022
Xulong Zhang
Jianzong Wang
Ning Cheng
Kexin Zhu
Jing Xiao
118
1
0
25 Oct 2022
SUPERB @ SLT 2022: Challenge on Generalization and Efficiency of
  Self-Supervised Speech Representation Learning
SUPERB @ SLT 2022: Challenge on Generalization and Efficiency of Self-Supervised Speech Representation LearningSpoken Language Technology Workshop (SLT), 2022
Tzu-hsun Feng
Annie Dong
Ching-Feng Yeh
Shu-Wen Yang
Tzu-Quan Lin
...
Xuankai Chang
Shinji Watanabe
Abdel-rahman Mohamed
Shang-Wen Li
Hung-yi Lee
ELMSSL
196
38
0
16 Oct 2022
Transfer Learning of wav2vec 2.0 for Automatic Lyric Transcription
Transfer Learning of wav2vec 2.0 for Automatic Lyric TranscriptionInternational Society for Music Information Retrieval Conference (ISMIR), 2022
Longshen Ou
Xiangming Gu
Ye Wang
157
24
0
20 Jul 2022
MET: Masked Encoding for Tabular Data
MET: Masked Encoding for Tabular Data
Kushal Majmundar
Sachin Goyal
Praneeth Netrapalli
Prateek Jain
LMTD
110
0
0
17 Jun 2022
Speaker Identification using Speech Recognition
Speaker Identification using Speech Recognition
Syeda Rabia Arshad
Syed Mujtaba Haider
Abdul Basit Mughal
96
1
0
29 May 2022
Speech Augmentation Based Unsupervised Learning for Keyword Spotting
Speech Augmentation Based Unsupervised Learning for Keyword SpottingIEEE International Joint Conference on Neural Network (IJCNN), 2022
Jian Luo
Jianzong Wang
Ning Cheng
Haobin Tang
Jing Xiao
SSL
136
2
0
28 May 2022
Adaptive multilingual speech recognition with pretrained models
Adaptive multilingual speech recognition with pretrained modelsInterspeech (Interspeech), 2022
Ngoc-Quan Pham
A. Waibel
Jan Niehues
VLM
141
24
0
24 May 2022
Self-Supervised Speech Representation Learning: A Review
Self-Supervised Speech Representation Learning: A ReviewIEEE Journal on Selected Topics in Signal Processing (IEEE JSTSP), 2022
Abdel-rahman Mohamed
Hung-yi Lee
Lasse Borgholt
Jakob Drachmann Havtorn
Joakim Edin
...
Shang-Wen Li
Karen Livescu
Lars Maaløe
Tara N. Sainath
Shinji Watanabe
SSLAI4TS
574
433
0
21 May 2022
Audio Self-supervised Learning: A Survey
Audio Self-supervised Learning: A SurveyPatterns (Patterns), 2022
Shuo Liu
Adria Mallol-Ragolta
Emilia Parada-Cabeleiro
Kun Qian
Xingshuo Jing
Alexander Kathan
Bin Hu
Bjoern W. Schuller
SSL
206
125
0
02 Mar 2022
A Brief Overview of Unsupervised Neural Speech Representation Learning
A Brief Overview of Unsupervised Neural Speech Representation Learning
Lasse Borgholt
Jakob Drachmann Havtorn
Joakim Edin
Lars Maaløe
Christian Igel
BDLAI4TSSSL
207
13
0
01 Mar 2022
A Noise-Robust Self-supervised Pre-training Model Based Speech
  Representation Learning for Automatic Speech Recognition
A Noise-Robust Self-supervised Pre-training Model Based Speech Representation Learning for Automatic Speech RecognitionIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2022
Qiu-shi Zhu
Jie Zhang
Zi-qiang Zhang
Ming Wu
Xin Fang
Lirong Dai
261
51
0
22 Jan 2022
Lacuna Reconstruction: Self-supervised Pre-training for Low-Resource
  Historical Document Transcription
Lacuna Reconstruction: Self-supervised Pre-training for Low-Resource Historical Document Transcription
Nikolai Vogler
J. Allen
M. Miller
Taylor Berg-Kirkpatrick
89
5
0
16 Dec 2021
SLUE: New Benchmark Tasks for Spoken Language Understanding Evaluation
  on Natural Speech
SLUE: New Benchmark Tasks for Spoken Language Understanding Evaluation on Natural SpeechIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2021
Suwon Shon
Ankita Pasad
Felix Wu
Pablo Brusco
Yoav Artzi
Karen Livescu
Kyu Jeong Han
AuLLMELM
219
90
0
19 Nov 2021
SCaLa: Supervised Contrastive Learning for End-to-End Speech Recognition
SCaLa: Supervised Contrastive Learning for End-to-End Speech RecognitionInterspeech (Interspeech), 2021
Li Fu
Xiaoxiao Li
Runyu Wang
Lu Fan
Zhengchen Zhang
Meng Chen
Youzheng Wu
Xiaodong He
SSL
140
3
0
08 Oct 2021
CTAL: Pre-training Cross-modal Transformer for Audio-and-Language
  Representations
CTAL: Pre-training Cross-modal Transformer for Audio-and-Language RepresentationsConference on Empirical Methods in Natural Language Processing (EMNLP), 2021
Hang Li
Yunxing Kang
Tianqiao Liu
Wenbiao Ding
Zitao Liu
142
20
0
01 Sep 2021
CLSRIL-23: Cross Lingual Speech Representations for Indic Languages
CLSRIL-23: Cross Lingual Speech Representations for Indic Languages
Anirudh Gupta
Harveen Singh Chadha
Priyanshi Shah
Neeraj Chimmwal
Ankur Dhuriya
Rishabh Gaur
Vivek Raghavan
122
41
0
15 Jul 2021
Dropout Regularization for Self-Supervised Learning of Transformer
  Encoder Speech Representation
Dropout Regularization for Self-Supervised Learning of Transformer Encoder Speech RepresentationInterspeech (Interspeech), 2021
Jian Luo
Jianzong Wang
Ning Cheng
Jing Xiao
SSL
135
6
0
09 Jul 2021
Low Resource German ASR with Untranscribed Data Spoken by Non-native
  Children -- INTERSPEECH 2021 Shared Task SPAPL System
Low Resource German ASR with Untranscribed Data Spoken by Non-native Children -- INTERSPEECH 2021 Shared Task SPAPL SystemInterspeech (Interspeech), 2021
Jinhan Wang
Yunzheng Zhu
Ruchao Fan
Wei Chu
Abeer Alwan
90
8
0
18 Jun 2021
Speech BERT Embedding For Improving Prosody in Neural TTS
Speech BERT Embedding For Improving Prosody in Neural TTSIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2021
Liping Chen
Yan Deng
Xi Wang
Frank Soong
Lei He
185
25
0
08 Jun 2021
Unsupervised Speech Recognition
Unsupervised Speech RecognitionNeural Information Processing Systems (NeurIPS), 2021
Alexei Baevski
Wei-Ning Hsu
Alexis Conneau
Michael Auli
SSL
337
292
0
24 May 2021
Improving speech recognition models with small samples for air traffic
  control systems
Improving speech recognition models with small samples for air traffic control systemsNeurocomputing (Neurocomputing), 2021
Yi Lin
Qin Li
Bo Yang
Zhen Yan
Huachun Tan
Zhengmao Chen
166
33
0
16 Feb 2021
Bi-APC: Bidirectional Autoregressive Predictive Coding for Unsupervised
  Pre-training and Its Application to Children's ASR
Bi-APC: Bidirectional Autoregressive Predictive Coding for Unsupervised Pre-training and Its Application to Children's ASRIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2021
Ruchao Fan
Amber Afshan
Abeer Alwan
124
14
0
12 Feb 2021
Efficiently Fusing Pretrained Acoustic and Linguistic Encoders for
  Low-resource Speech Recognition
Efficiently Fusing Pretrained Acoustic and Linguistic Encoders for Low-resource Speech RecognitionIEEE Signal Processing Letters (IEEE SPL), 2021
Cheng Yi
Shiyu Zhou
Bo Xu
152
44
0
17 Jan 2021
Applying Wav2vec2.0 to Speech Recognition in Various Low-resource
  Languages
Applying Wav2vec2.0 to Speech Recognition in Various Low-resource Languages
Cheng Yi
Jianzhong Wang
Ning Cheng
Shiyu Zhou
Bo Xu
SSLVLM
147
87
0
22 Dec 2020
Sequence-to-Sequence Contrastive Learning for Text Recognition
Sequence-to-Sequence Contrastive Learning for Text RecognitionComputer Vision and Pattern Recognition (CVPR), 2020
Aviad Aberdam
Ron Litman
Shahar Tsiper
Oron Anschel
Ron Slossberg
Shai Mazor
R. Manmatha
Pietro Perona
202
121
0
20 Dec 2020
DeCoAR 2.0: Deep Contextualized Acoustic Representations with Vector
  Quantization
DeCoAR 2.0: Deep Contextualized Acoustic Representations with Vector Quantization
Shaoshi Ling
Yuzong Liu
140
112
0
11 Dec 2020
Exploring wav2vec 2.0 on speaker verification and language
  identification
Exploring wav2vec 2.0 on speaker verification and language identificationInterspeech (Interspeech), 2020
Zhiyun Fan
Meng Li
Shiyu Zhou
Bo Xu
224
223
0
11 Dec 2020
Towards Semi-Supervised Semantics Understanding from Speech
Towards Semi-Supervised Semantics Understanding from Speech
Cheng-I Jeff Lai
Jin Cao
S. Bodapati
Shang-Wen Li
SSL
165
7
0
11 Nov 2020
Learning Contextual Tag Embeddings for Cross-Modal Alignment of Audio
  and Tags
Learning Contextual Tag Embeddings for Cross-Modal Alignment of Audio and TagsIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2020
Xavier Favory
Konstantinos Drossos
Maria Sandsten
Xavier Serra
195
16
0
27 Oct 2020
Speech SIMCLR: Combining Contrastive and Reconstruction Objective for
  Self-supervised Speech Representation Learning
Speech SIMCLR: Combining Contrastive and Reconstruction Objective for Self-supervised Speech Representation LearningInterspeech (Interspeech), 2020
Dongwei Jiang
Wubo Li
Miao Cao
Wei Zou
Xiangang Li
SSL
254
71
0
27 Oct 2020
Similarity Analysis of Self-Supervised Speech Representations
Similarity Analysis of Self-Supervised Speech Representations
Yu-An Chung
Yonatan Belinkov
James R. Glass
SSL
330
44
0
22 Oct 2020
Self-training and Pre-training are Complementary for Speech Recognition
Self-training and Pre-training are Complementary for Speech Recognition
Qiantong Xu
Alexei Baevski
Tatiana Likhomanenko
Paden Tomasello
Alexis Conneau
R. Collobert
Gabriel Synnaeve
Michael Auli
SSLVLM
243
176
0
22 Oct 2020
A Transformer Based Pitch Sequence Autoencoder with MIDI Augmentation
A Transformer Based Pitch Sequence Autoencoder with MIDI Augmentation
Mingshuo Ding
Yi Ma
123
1
0
15 Oct 2020
Representation Learning for Sequence Data with Deep Autoencoding
  Predictive Components
Representation Learning for Sequence Data with Deep Autoencoding Predictive Components
Junwen Bai
Weiran Wang
Yingbo Zhou
Caiming Xiong
SSLAI4TS
169
12
0
07 Oct 2020
Transformer with Bidirectional Decoder for Speech Recognition
Transformer with Bidirectional Decoder for Speech RecognitionInterspeech (Interspeech), 2020
Xi Chen
Songyang Zhang
Dandan Song
P. Ouyang
Shouyi Yin
111
15
0
11 Aug 2020
Transformer based unsupervised pre-training for acoustic representation
  learning
Transformer based unsupervised pre-training for acoustic representation learningIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2020
Ruixiong Zhang
Haiwei Wu
Wubo Li
Dongwei Jiang
Wei Zou
Xiangang Li
SSLViT
223
30
0
29 Jul 2020
TERA: Self-Supervised Learning of Transformer Encoder Representation for
  Speech
TERA: Self-Supervised Learning of Transformer Encoder Representation for SpeechIEEE/ACM Transactions on Audio Speech and Language Processing (TASLP), 2020
Andy T. Liu
Shang-Wen Li
Hung-yi Lee
SSL
484
389
0
12 Jul 2020
Unsupervised Cross-lingual Representation Learning for Speech
  Recognition
Unsupervised Cross-lingual Representation Learning for Speech RecognitionInterspeech (Interspeech), 2020
Alexis Conneau
Alexei Baevski
R. Collobert
Abdel-rahman Mohamed
Michael Auli
SSL
301
900
0
24 Jun 2020
Embodied Self-supervised Learning by Coordinated Sampling and Training
Embodied Self-supervised Learning by Coordinated Sampling and Training
Yifan Sun
Xihong Wu
SSL
116
9
0
20 Jun 2020
wav2vec 2.0: A Framework for Self-Supervised Learning of Speech
  Representations
wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations
Alexei Baevski
Henry Zhou
Abdel-rahman Mohamed
Michael Auli
SSL
1.1K
7,195
0
20 Jun 2020
12
Next