ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2007.06028
  4. Cited By
TERA: Self-Supervised Learning of Transformer Encoder Representation for
  Speech

TERA: Self-Supervised Learning of Transformer Encoder Representation for Speech

12 July 2020
Andy T. Liu
Shang-Wen Li
Hung-yi Lee
    SSL
ArXivPDFHTML

Papers citing "TERA: Self-Supervised Learning of Transformer Encoder Representation for Speech"

50 / 215 papers shown
Title
USTED: Improving ASR with a Unified Speech and Text Encoder-Decoder
USTED: Improving ASR with a Unified Speech and Text Encoder-Decoder
Bolaji Yusuf
Ankur Gandhe
Alex Sokolov
24
8
0
12 Feb 2022
data2vec: A General Framework for Self-supervised Learning in Speech,
  Vision and Language
data2vec: A General Framework for Self-supervised Learning in Speech, Vision and Language
Alexei Baevski
Wei-Ning Hsu
Qiantong Xu
Arun Babu
Jiatao Gu
Michael Auli
SSL
VLM
ViT
27
833
0
07 Feb 2022
Self-Supervised Representation Learning for Speech Using Visual
  Grounding and Masked Language Modeling
Self-Supervised Representation Learning for Speech Using Visual Grounding and Masked Language Modeling
Puyuan Peng
David F. Harwath
SSL
28
26
0
07 Feb 2022
Self-supervised Speaker Recognition Training Using Human-Machine
  Dialogues
Self-supervised Speaker Recognition Training Using Human-Machine Dialogues
Metehan Cekic
Ruirui Li
Zeya Chen
Yuguang Yang
A. Stolcke
Upamanyu Madhow
SSL
9
2
0
07 Feb 2022
Speech Emotion Recognition using Self-Supervised Features
Speech Emotion Recognition using Self-Supervised Features
E. Morais
R. Hoory
Weizhong Zhu
Itai Gat
Matheus Damasceno
Hagai Aronowitz
SSL
MDE
18
112
0
07 Feb 2022
Speaker Normalization for Self-supervised Speech Emotion Recognition
Speaker Normalization for Self-supervised Speech Emotion Recognition
Itai Gat
Hagai Aronowitz
Weizhong Zhu
E. Morais
R. Hoory
25
50
0
02 Feb 2022
A Pre-trained Audio-Visual Transformer for Emotion Recognition
A Pre-trained Audio-Visual Transformer for Emotion Recognition
Minh Tran
M. Soleymani
58
25
0
23 Jan 2022
A Noise-Robust Self-supervised Pre-training Model Based Speech
  Representation Learning for Automatic Speech Recognition
A Noise-Robust Self-supervised Pre-training Model Based Speech Representation Learning for Automatic Speech Recognition
Qiu-shi Zhu
Jie M. Zhang
Zi-qiang Zhang
Ming Wu
Xin Fang
Lirong Dai
115
39
0
22 Jan 2022
Using Deep Learning with Large Aggregated Datasets for COVID-19
  Classification from Cough
Using Deep Learning with Large Aggregated Datasets for COVID-19 Classification from Cough
Esin Darici Haritaoglu
Nicholas Rasmussen
Daniel C. H. Tan
J. JenniferRanjani
Jaclyn Xiao
...
Minami Yamaura
Laura Gomezjurado
Aaron Broukhim
Amil Khanzada
Mert Pilanci
10
9
0
05 Jan 2022
Attribute Inference Attack of Speech Emotion Recognition in Federated
  Learning Settings
Attribute Inference Attack of Speech Emotion Recognition in Federated Learning Settings
Tiantian Feng
H. Hashemi
Rajat Hebbar
M. Annavaram
Shrikanth S. Narayanan
11
25
0
26 Dec 2021
Self-Supervised Learning based Monaural Speech Enhancement with
  Multi-Task Pre-Training
Self-Supervised Learning based Monaural Speech Enhancement with Multi-Task Pre-Training
Yi Li
Yang Sun
S. M. Naqvi
SSL
24
0
0
21 Dec 2021
Self-Supervised Learning based Monaural Speech Enhancement with
  Complex-Cycle-Consistent
Self-Supervised Learning based Monaural Speech Enhancement with Complex-Cycle-Consistent
Yi Li
Yang Sun
S. M. Naqvi
14
1
0
21 Dec 2021
Self-Supervised Learning for speech recognition with Intermediate layer
  supervision
Self-Supervised Learning for speech recognition with Intermediate layer supervision
Chengyi Wang
Yu-Huan Wu
Sanyuan Chen
Shujie Liu
Jinyu Li
Yao Qian
Zhenglu Yang
SSL
16
28
0
16 Dec 2021
FLAVA: A Foundational Language And Vision Alignment Model
FLAVA: A Foundational Language And Vision Alignment Model
Amanpreet Singh
Ronghang Hu
Vedanuj Goswami
Guillaume Couairon
Wojciech Galuba
Marcus Rohrbach
Douwe Kiela
CLIP
VLM
38
685
0
08 Dec 2021
ESPnet-SLU: Advancing Spoken Language Understanding through ESPnet
ESPnet-SLU: Advancing Spoken Language Understanding through ESPnet
Siddhant Arora
Siddharth Dalmia
Pavel Denisov
Xuankai Chang
Yushi Ueda
...
Karthik Ganesan
Brian Yan
Ngoc Thang Vu
A. Black
Shinji Watanabe
VLM
20
74
0
29 Nov 2021
Membership Inference Attacks Against Self-supervised Speech Models
Membership Inference Attacks Against Self-supervised Speech Models
Wei-Cheng Tseng
Wei-Tsung Kao
Hung-yi Lee
30
14
0
09 Nov 2021
Meta-TTS: Meta-Learning for Few-Shot Speaker Adaptive Text-to-Speech
Meta-TTS: Meta-Learning for Few-Shot Speaker Adaptive Text-to-Speech
Sung-Feng Huang
Chyi-Jiunn Lin
Da-Rong Liu
Yi-Chen Chen
Hung-yi Lee
8
55
0
07 Nov 2021
A Fine-tuned Wav2vec 2.0/HuBERT Benchmark For Speech Emotion
  Recognition, Speaker Verification and Spoken Language Understanding
A Fine-tuned Wav2vec 2.0/HuBERT Benchmark For Speech Emotion Recognition, Speaker Verification and Spoken Language Understanding
Yingzhi Wang
Abdelmoumene Boumadane
A. Heba
18
144
0
04 Nov 2021
Combining Unsupervised and Text Augmented Semi-Supervised Learning for
  Low Resourced Autoregressive Speech Recognition
Combining Unsupervised and Text Augmented Semi-Supervised Learning for Low Resourced Autoregressive Speech Recognition
Chak-Fai Li
Francis Keith
William Hartmann
M. Snover
SSL
17
2
0
29 Oct 2021
WavLM: Large-Scale Self-Supervised Pre-Training for Full Stack Speech
  Processing
WavLM: Large-Scale Self-Supervised Pre-Training for Full Stack Speech Processing
Sanyuan Chen
Chengyi Wang
Zhengyang Chen
Yu-Huan Wu
Shujie Liu
...
Yao Qian
Jian Wu
Micheal Zeng
Xiangzhan Yu
Furu Wei
SSL
73
1,694
0
26 Oct 2021
Speech Representation Learning Through Self-supervised Pretraining And
  Multi-task Finetuning
Speech Representation Learning Through Self-supervised Pretraining And Multi-task Finetuning
Yi-Chen Chen
Shu-Wen Yang
Cheng-Kuang Lee
Simon See
Hung-yi Lee
SSL
9
12
0
18 Oct 2021
DECAR: Deep Clustering for learning general-purpose Audio
  Representations
DECAR: Deep Clustering for learning general-purpose Audio Representations
Sreyan Ghosh
Sandesh V Katta
Ashish Seth
S. Umesh
SSL
31
12
0
17 Oct 2021
Don't speak too fast: The impact of data bias on self-supervised speech
  models
Don't speak too fast: The impact of data bias on self-supervised speech models
Yen Meng
Yi-Hui Chou
Andy T. Liu
Hung-yi Lee
34
24
0
15 Oct 2021
Attention-Free Keyword Spotting
Attention-Free Keyword Spotting
Mashrur M. Morshed
Ahmad Omar Ahsan
23
8
0
14 Oct 2021
Detecting Corrupted Labels Without Training a Model to Predict
Detecting Corrupted Labels Without Training a Model to Predict
Zhaowei Zhu
Zihao Dong
Yang Liu
NoLa
141
62
0
12 Oct 2021
Word Order Does Not Matter For Speech Recognition
Word Order Does Not Matter For Speech Recognition
Vineel Pratap
Qiantong Xu
Tatiana Likhomanenko
Gabriel Synnaeve
R. Collobert
27
4
0
12 Oct 2021
UniSpeech-SAT: Universal Speech Representation Learning with Speaker
  Aware Pre-Training
UniSpeech-SAT: Universal Speech Representation Learning with Speaker Aware Pre-Training
Sanyuan Chen
Yu Wu
Chengyi Wang
Zhengyang Chen
Zhuo Chen
...
Jian Wu
Yao Qian
Furu Wei
Jinyu Li
Xiangzhan Yu
SSL
22
84
0
12 Oct 2021
An Exploration of Self-Supervised Pretrained Representations for
  End-to-End Speech Recognition
An Exploration of Self-Supervised Pretrained Representations for End-to-End Speech Recognition
Xuankai Chang
Takashi Maekaku
Pengcheng Guo
Jing Shi
Yen-Ju Lu
...
Tianzi Wang
Shu-Wen Yang
Yu Tsao
Hung-yi Lee
Shinji Watanabe
SSL
AI4TS
14
81
0
09 Oct 2021
Wav2vec-S: Semi-Supervised Pre-Training for Low-Resource ASR
Wav2vec-S: Semi-Supervised Pre-Training for Low-Resource ASR
Hanjing Zhu
Li Wang
Jindong Wang
Gaofeng Cheng
Pengyuan Zhang
Yonghong Yan
SSL
VLM
14
9
0
09 Oct 2021
Improving Pseudo-label Training For End-to-end Speech Recognition Using
  Gradient Mask
Improving Pseudo-label Training For End-to-end Speech Recognition Using Gradient Mask
Shaoshi Ling
Chen Shen
Meng Cai
Zejun Ma
VLM
SSL
20
8
0
08 Oct 2021
Mandarin-English Code-switching Speech Recognition with Self-supervised
  Speech Representation Models
Mandarin-English Code-switching Speech Recognition with Self-supervised Speech Representation Models
Liang-Hsuan Tseng
Yu-Kuan Fu
Heng-Jui Chang
Hung-yi Lee
SSL
28
14
0
07 Oct 2021
DistilHuBERT: Speech Representation Learning by Layer-wise Distillation
  of Hidden-unit BERT
DistilHuBERT: Speech Representation Learning by Layer-wise Distillation of Hidden-unit BERT
Heng-Jui Chang
Shu-Wen Yang
Hung-yi Lee
SSL
22
163
0
05 Oct 2021
Comparison of Self-Supervised Speech Pre-Training Methods on Flemish
  Dutch
Comparison of Self-Supervised Speech Pre-Training Methods on Flemish Dutch
Jakob Poncelet
Hugo Van hamme
SSL
21
1
0
29 Sep 2021
Complementing Handcrafted Features with Raw Waveform Using a
  Light-weight Auxiliary Model
Complementing Handcrafted Features with Raw Waveform Using a Light-weight Auxiliary Model
Zhongwei Teng
Quchen Fu
Jules White
Maria E. Powell
Douglas C. Schmidt
8
5
0
06 Sep 2021
CTAL: Pre-training Cross-modal Transformer for Audio-and-Language
  Representations
CTAL: Pre-training Cross-modal Transformer for Audio-and-Language Representations
Hang Li
Yunxing Kang
Tianqiao Liu
Wenbiao Ding
Zitao Liu
17
17
0
01 Sep 2021
W2v-BERT: Combining Contrastive Learning and Masked Language Modeling
  for Self-Supervised Speech Pre-Training
W2v-BERT: Combining Contrastive Learning and Masked Language Modeling for Self-Supervised Speech Pre-Training
Yu-An Chung
Yu Zhang
Wei Han
Chung-Cheng Chiu
James Qin
Ruoming Pang
Yonghui Wu
SSL
VLM
10
408
0
07 Aug 2021
Dropout Regularization for Self-Supervised Learning of Transformer
  Encoder Speech Representation
Dropout Regularization for Self-Supervised Learning of Transformer Encoder Speech Representation
Jian Luo
Jianzong Wang
Ning Cheng
Jing Xiao
SSL
11
6
0
09 Jul 2021
What do End-to-End Speech Models Learn about Speaker, Language and
  Channel Information? A Layer-wise and Neuron-level Analysis
What do End-to-End Speech Models Learn about Speaker, Language and Channel Information? A Layer-wise and Neuron-level Analysis
Shammur A. Chowdhury
Nadir Durrani
Ahmed M. Ali
25
12
0
01 Jul 2021
Using Self-Supervised Feature Extractors with Attention for Automatic
  COVID-19 Detection from Speech
Using Self-Supervised Feature Extractors with Attention for Automatic COVID-19 Detection from Speech
John Mendonça
Rubén Solera-Ureña
A. Abad
Isabel Trancoso
SSL
32
1
0
30 Jun 2021
Fusion of Embeddings Networks for Robust Combination of Text Dependent
  and Independent Speaker Recognition
Fusion of Embeddings Networks for Robust Combination of Text Dependent and Independent Speaker Recognition
Ruirui Li
C. Ju
Zeya Chen
Hongda Mao
Oguz H. Elibol
A. Stolcke
9
3
0
18 Jun 2021
PARP: Prune, Adjust and Re-Prune for Self-Supervised Speech Recognition
PARP: Prune, Adjust and Re-Prune for Self-Supervised Speech Recognition
Cheng-I Jeff Lai
Yang Zhang
Alexander H. Liu
Shiyu Chang
Yi-Lun Liao
Yung-Sung Chuang
Kaizhi Qian
Sameer Khurana
David D. Cox
James R. Glass
VLM
49
70
0
10 Jun 2021
Improving the Adversarial Robustness for Speaker Verification by
  Self-Supervised Learning
Improving the Adversarial Robustness for Speaker Verification by Self-Supervised Learning
Haibin Wu
Xu Li
Andy T. Liu
Zhiyong Wu
H. Meng
Hung-yi Lee
AAML
SSL
29
29
0
01 Jun 2021
SUPERB: Speech processing Universal PERformance Benchmark
SUPERB: Speech processing Universal PERformance Benchmark
Shu-Wen Yang
Po-Han Chi
Yung-Sung Chuang
Cheng-I Jeff Lai
Kushal Lakhotia
...
Shuyan Dong
Shang-Wen Li
Shinji Watanabe
Abdel-rahman Mohamed
Hung-yi Lee
SSL
22
885
0
03 May 2021
Layer Reduction: Accelerating Conformer-Based Self-Supervised Model via
  Layer Consistency
Layer Reduction: Accelerating Conformer-Based Self-Supervised Model via Layer Consistency
Jinchuan Tian
Rongzhi Gu
Helin Wang
Yuexian Zou
16
0
0
08 Apr 2021
Utilizing Self-supervised Representations for MOS Prediction
Utilizing Self-supervised Representations for MOS Prediction
Wei-Cheng Tseng
Chien-yu Huang
Wei-Tsung Kao
Yist Y. Lin
Hung-yi Lee
SSL
11
63
0
07 Apr 2021
Comparing CTC and LFMMI for out-of-domain adaptation of wav2vec 2.0
  acoustic model
Comparing CTC and LFMMI for out-of-domain adaptation of wav2vec 2.0 acoustic model
Apoorv Vyas
S. Madikeri
H. Bourlard
6
15
0
06 Apr 2021
Keyword Transformer: A Self-Attention Model for Keyword Spotting
Keyword Transformer: A Self-Attention Model for Keyword Spotting
Axel Berg
Mark O'Connor
M. T. Cruz
11
129
0
01 Apr 2021
XLST: Cross-lingual Self-training to Learn Multilingual Representation
  for Low Resource Speech Recognition
XLST: Cross-lingual Self-training to Learn Multilingual Representation for Low Resource Speech Recognition
Zi-qiang Zhang
Yan Song
Ming Wu
Xin Fang
Lirong Dai
SSL
14
21
0
15 Mar 2021
Adversarial defense for automatic speaker verification by cascaded
  self-supervised learning models
Adversarial defense for automatic speaker verification by cascaded self-supervised learning models
Haibin Wu
Xu Li
Andy T. Liu
Zhiyong Wu
H. Meng
Hung-yi Lee
AAML
27
40
0
14 Feb 2021
Fused Acoustic and Text Encoding for Multimodal Bilingual Pretraining
  and Speech Translation
Fused Acoustic and Text Encoding for Multimodal Bilingual Pretraining and Speech Translation
Renjie Zheng
Junkun Chen
Mingbo Ma
Liang Huang
26
69
0
10 Feb 2021
Previous
12345
Next