TERA: Self-Supervised Learning of Transformer Encoder Representation for Speech

12 July 2020

Papers citing "TERA: Self-Supervised Learning of Transformer Encoder Representation for Speech"

50 / 215 papers shown

Title
Speech Enhancement Using Self-Supervised Pre-Trained Model and Vector Quantization Xiaokang Zhao Qiu-shi Zhu Jie M. Zhang 25 4 0 28 Sep 2022
End-to-End Lyrics Recognition with Self-supervised Learning Xiangyu Zhang Shuyue Stella Li Zhanhong He R. Togneri Leibny Paola García 20 0 0 26 Sep 2022
Information-Theoretic Hashing for Zero-Shot Cross-Modal Retrieval Yufeng Shi Shujian Yu Duanquan Xu Xinge You 18 1 0 26 Sep 2022
Non-Contrastive Self-supervised Learning for Utterance-Level Information Extraction from Speech Jaejin Cho Jesús Villalba Laureano Moro Velázquez Najim Dehak SSL 20 16 0 10 Aug 2022
COCOA: Cross Modality Contrastive Learning for Sensor Data Shohreh Deldari Hao Xue Aaqib Saeed Daniel V. Smith Flora D. Salim SSL 34 38 0 31 Jul 2022
Deep versus Wide: An Analysis of Student Architectures for Task-Agnostic Knowledge Distillation of Self-Supervised Speech Models Takanori Ashihara Takafumi Moriya Kohei Matsuura Tomohiro Tanaka 17 25 0 14 Jul 2022
A Comparative Study of Self-supervised Speech Representation Based Voice Conversion Wen-Chin Huang Shu-Wen Yang Tomoki Hayashi T. Toda 10 15 0 10 Jul 2022
FitHuBERT: Going Thinner and Deeper for Knowledge Distillation of Speech Self-Supervised Learning Yeonghyeon Lee Kangwook Jang Jahyun Goo Youngmoon Jung Hoi-Rim Kim 10 28 0 01 Jul 2022
FeaRLESS: Feature Refinement Loss for Ensembling Self-Supervised Learning Features in Robust End-to-end Speech Recognition Szu-Jui Chen Jiamin Xie John H. L. Hansen 14 8 0 30 Jun 2022
The THUEE System Description for the IARPA OpenASR21 Challenge Jing Zhao Haoyu Wang Jinpeng Li Shuzhou Chai Guan-Bo Wang Guoguo Chen Weiqiang Zhang VLM 17 1 0 29 Jun 2022
Boosting Cross-Domain Speech Recognition with Self-Supervision Hanjing Zhu Gaofeng Cheng Jindong Wang Wenxin Hou Pengyuan Zhang Yonghong Yan 11 13 0 20 Jun 2022
DRAFT: A Novel Framework to Reduce Domain Shifting in Self-supervised Learning and Its Application to Children's ASR Ruchao Fan Abeer Alwan 19 30 0 16 Jun 2022
Transformer-based Automatic Speech Recognition of Formal and Colloquial Czech in MALACH Project Jan Lehecka J. Psutka Josef Psutka 6 4 0 15 Jun 2022
Investigation of Ensemble features of Self-Supervised Pretrained Models for Automatic Speech Recognition Anjana Arunkumar Vrunda N. Sukhadia S. Umesh 17 10 0 11 Jun 2022
Feature Learning and Ensemble Pre-Tasks Based Self-Supervised Speech Denoising and Dereverberation Yi Li ShuangLin Li Yang Sun S. M. Naqvi 4 0 0 10 Jun 2022
Joint Encoder-Decoder Self-Supervised Pre-training for ASR Arunkumar A S. Umesh SSL 24 8 0 09 Jun 2022
Beyond Just Vision: A Review on Self-Supervised Representation Learning on Multimodal and Temporal Data Shohreh Deldari Hao Xue Aaqib Saeed Jiayuan He Daniel V. Smith Flora D. Salim AI4TS 17 37 0 06 Jun 2022
Contrastive Siamese Network for Semi-supervised Speech Recognition S. Khorram Jaeyoung Kim Anshuman Tripathi Han Lu Qian Zhang Hasim Sak SSL 6 11 0 27 May 2022
Joint Training of Speech Enhancement and Self-supervised Model for Noise-robust ASR Qiu-shi Zhu Jie M. Zhang Zitian Zhang Lirong Dai 35 15 0 26 May 2022
Self-Supervised Speech Representation Learning: A Review Abdel-rahman Mohamed Hung-yi Lee Lasse Borgholt Jakob Drachmann Havtorn Joakim Edin ... Shang-Wen Li Karen Livescu Lars Maaløe Tara N. Sainath Shinji Watanabe SSL AI4TS 124 344 0 21 May 2022
Silence is Sweeter Than Speech: Self-Supervised Model Using Silence to Store Speaker Information Chiyu Feng Po-Chun Hsu Hung-yi Lee SSL 17 8 0 08 May 2022
i-Code: An Integrative and Composable Multimodal Learning Framework Ziyi Yang Yuwei Fang Chenguang Zhu Reid Pryzant Dongdong Chen ... Bin Xiao Yuanxun Lu Takuya Yoshioka Michael Zeng Xuedong Huang 40 45 0 03 May 2022
Masked Spectrogram Modeling using Masked Autoencoders for Learning General-purpose Audio Representation Daisuke Niizumi Daiki Takeuchi Yasunori Ohishi N. Harada K. Kashino 19 65 0 26 Apr 2022
ATST: Audio Representation Learning with Teacher-Student Transformer Xian Li Xiaofei Li ViT 17 20 0 26 Apr 2022
On-demand compute reduction with stochastic wav2vec 2.0 Apoorv Vyas Wei-Ning Hsu Michael Auli Alexei Baevski 8 13 0 25 Apr 2022
Cross-stitched Multi-modal Encoders Karan Singla Daniel Pressel Ryan Price Bhargav Srinivas Chinnari Yeon-Jun Kim S. Bangalore 16 0 0 20 Apr 2022
ContentVec: An Improved Self-Supervised Speech Representation by Disentangling Speakers Kaizhi Qian Yang Zhang Heting Gao Junrui Ni Cheng-I Jeff Lai David D. Cox M. Hasegawa-Johnson Shiyu Chang DRL 14 110 0 20 Apr 2022
Bridging Cross-Lingual Gaps During Leveraging the Multilingual Sequence-to-Sequence Pretraining for Text Generation and Understanding Changtong Zan Liang Ding Li Shen Yu Cao Weifeng Liu Dacheng Tao LRM 24 8 0 16 Apr 2022
Points to Patches: Enabling the Use of Self-Attention for 3D Shape Recognition Axel Berg Magnus Oskarsson Mark O'Connor 3DPC ViT 19 26 0 08 Apr 2022
Automatic Pronunciation Assessment using Self-Supervised Speech Representation Learning Eesung Kim J. Jeon Hyeji Seo Ho-Young Kim SSL 19 37 0 08 Apr 2022
DDOS: A MOS Prediction Framework utilizing Domain Adaptive Pre-training and Distribution of Opinion Scores Wei-Cheng Tseng Wei-Tsung Kao Hung-yi Lee 11 21 0 07 Apr 2022
Combining Spectral and Self-Supervised Features for Low Resource Speech Recognition and Translation Dan Berrebbi Jiatong Shi Brian Yan Osbel López-Francisco Jonathan D. Amith Shinji Watanabe 8 26 0 05 Apr 2022
Cross-lingual Self-Supervised Speech Representations for Improved Dysarthric Speech Recognition Abner Hernandez Paula Andrea Pérez-Toro Elmar Nöth J. Orozco-Arroyave Andreas K. Maier S. Yang 15 38 0 04 Apr 2022
On the Efficiency of Integrating Self-supervised Learning and Meta-learning for User-defined Few-shot Keyword Spotting Wei-Tsung Kao Yue Wu Chia-Ping Chen Zhi-Sheng Chen Yu-Pao Tsai Hung-yi Lee 6 7 0 01 Apr 2022
Analyzing the factors affecting usefulness of Self-Supervised Pre-trained Representations for Speech Recognition Ashish Seth L. D. Prasad Sreyan Ghosh S. Umesh 28 3 0 31 Mar 2022
PADA: Pruning Assisted Domain Adaptation for Self-Supervised Speech Representations L. D. Prasad Sreyan Ghosh S. Umesh 17 12 0 31 Mar 2022
Error Correction Code Transformer Yoni Choukroun Lior Wolf 11 47 0 27 Mar 2022
DeLoRes: Decorrelating Latent Spaces for Low-Resource Audio Representation Learning Sreyan Ghosh Ashish Seth and Deepak Mittal Maneesh Singh S. Umesh SSL 22 6 0 25 Mar 2022
Delta Keyword Transformer: Bringing Transformers to the Edge through Dynamically Pruned Multi-Head Self-Attention Zuzana Jelčicová Marian Verhelst 26 5 0 20 Mar 2022
Similarity and Content-based Phonetic Self Attention for Speech Recognition Kyuhong Shim Wonyong Sung 8 7 0 19 Mar 2022
A $^3$ T: Alignment-Aware Acoustic and Text Pretraining for Speech Synthesis and Editing Richard He Bai Renjie Zheng Junkun Chen Xintong Li Mingbo Ma Liang Huang 16 49 0 18 Mar 2022
Learning Audio Representations with MLPs Mashrur M. Morshed Ahmad Omar Ahsan H. Mahmud Md. Kamrul Hasan 13 4 0 16 Mar 2022
Semi-FedSER: Semi-supervised Learning for Speech Emotion Recognition On Federated Learning using Multiview Pseudo-Labeling Tiantian Feng Shrikanth Narayanan 25 17 0 15 Mar 2022
SUPERB-SG: Enhanced Speech processing Universal PERformance Benchmark for Semantic and Generative Capabilities Hsiang-Sheng Tsai Heng-Jui Chang Wen-Chin Huang Zili Huang Kushal Lakhotia ... Hsuan-Jui Chen Shang-Wen Li Shinji Watanabe Abdel-rahman Mohamed Hung-yi Lee 18 109 0 14 Mar 2022
Audio Self-supervised Learning: A Survey Shuo Liu Adria Mallol-Ragolta Emilia Parada-Cabeleiro Kun Qian Xingshuo Jing Alexander Kathan Bin Hu Bjoern W. Schuller SSL 22 106 0 02 Mar 2022
Towards a Common Speech Analysis Engine Hagai Aronowitz Itai Gat E. Morais Weizhong Zhu R. Hoory 12 3 0 01 Mar 2022
A Brief Overview of Unsupervised Neural Speech Representation Learning Lasse Borgholt Jakob Drachmann Havtorn Joakim Edin Lars Maaløe Christian Igel BDL AI4TS SSL 19 11 0 01 Mar 2022
Assessing the State of Self-Supervised Human Activity Recognition using Wearables H. Haresamudram Irfan Essa Thomas Plötz SSL 25 84 0 22 Feb 2022
RemixIT: Continual self-training of speech enhancement models via bootstrapped remixing Efthymios Tzinis Yossi Adi V. Ithapu Buye Xu Paris Smaragdis Anurag Kumar CLL 14 54 0 17 Feb 2022
Learning Contextually Fused Audio-visual Representations for Audio-visual Speech Recognition Zitian Zhang Jie M. Zhang Jian-Shu Zhang Ming Wu Xin Fang Lirong Dai SSL 22 10 0 15 Feb 2022