ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2007.06028
  4. Cited By
TERA: Self-Supervised Learning of Transformer Encoder Representation for
  Speech

TERA: Self-Supervised Learning of Transformer Encoder Representation for Speech

12 July 2020
Andy T. Liu
Shang-Wen Li
Hung-yi Lee
    SSL
ArXivPDFHTML

Papers citing "TERA: Self-Supervised Learning of Transformer Encoder Representation for Speech"

50 / 215 papers shown
Title
Self-supervised Audio Teacher-Student Transformer for Both Clip-level
  and Frame-level Tasks
Self-supervised Audio Teacher-Student Transformer for Both Clip-level and Frame-level Tasks
Xian Li
Nian Shao
Xiaofei Li
ViT
CLIP
13
25
0
07 Jun 2023
Improved Cross-Lingual Transfer Learning For Automatic Speech
  Translation
Improved Cross-Lingual Transfer Learning For Automatic Speech Translation
Sameer Khurana
Nauman Dawalatabad
Antoine Laurent
Luis Vicente
Pablo Gimeno
Victoria Mingote
James R. Glass
VLM
14
1
0
01 Jun 2023
MiniSUPERB: Lightweight Benchmark for Self-supervised Speech Models
MiniSUPERB: Lightweight Benchmark for Self-supervised Speech Models
Yu-Hsiang Wang
Huan Chen
Kai-Wei Chang
Winston H. Hsu
Hung-yi Lee
16
6
0
30 May 2023
GMSF: Global Matching Scene Flow
GMSF: Global Matching Scene Flow
Yushan Zhang
Johan Edstedt
Bastian Wandt
Per-Erik Forssén
Maria Magnusson
M. Felsberg
29
9
0
27 May 2023
Weakly-Supervised Speech Pre-training: A Case Study on Target Speech
  Recognition
Weakly-Supervised Speech Pre-training: A Case Study on Target Speech Recognition
Wangyou Zhang
Y. Qian
38
10
0
25 May 2023
Masked Modeling Duo for Speech: Specializing General-Purpose Audio
  Representation to Speech using Denoising Distillation
Masked Modeling Duo for Speech: Specializing General-Purpose Audio Representation to Speech using Denoising Distillation
Daisuke Niizumi
Daiki Takeuchi
Yasunori Ohishi
Noboru Harada
K. Kashino
37
3
0
23 May 2023
Can Self-Supervised Neural Representations Pre-Trained on Human Speech
  distinguish Animal Callers?
Can Self-Supervised Neural Representations Pre-Trained on Human Speech distinguish Animal Callers?
Eklavya Sarkar
Mathew Magimai.-Doss
16
11
0
23 May 2023
Wav2SQL: Direct Generalizable Speech-To-SQL Parsing
Wav2SQL: Direct Generalizable Speech-To-SQL Parsing
Huadai Liu
Rongjie Huang
Jinzheng He
Gang Sun
Ran Shen
Xize Cheng
Zhou Zhao
25
3
0
21 May 2023
Recycle-and-Distill: Universal Compression Strategy for
  Transformer-based Speech SSL Models with Attention Map Reusing and Masking
  Distillation
Recycle-and-Distill: Universal Compression Strategy for Transformer-based Speech SSL Models with Attention Map Reusing and Masking Distillation
Kangwook Jang
Sungnyun Kim
Se-Young Yun
Hoi-Rim Kim
19
5
0
19 May 2023
TrustSER: On the Trustworthiness of Fine-tuning Pre-trained Speech
  Embeddings For Speech Emotion Recognition
TrustSER: On the Trustworthiness of Fine-tuning Pre-trained Speech Embeddings For Speech Emotion Recognition
Tiantian Feng
Rajat Hebbar
Shrikanth Narayanan
28
7
0
18 May 2023
A Survey on Time-Series Pre-Trained Models
A Survey on Time-Series Pre-Trained Models
Qianli Ma
Z. Liu
Zhenjing Zheng
Ziyang Huang
Siying Zhu
Zhongzhong Yu
James T. Kwok
AI4TS
21
50
0
18 May 2023
Speech Separation based on Contrastive Learning and Deep Modularization
Speech Separation based on Contrastive Learning and Deep Modularization
Peter Ochieng
SSL
22
0
0
18 May 2023
SAR: Self-Supervised Anti-Distortion Representation for End-To-End
  Speech Model
SAR: Self-Supervised Anti-Distortion Representation for End-To-End Speech Model
Jianzong Wang
Xulong Zhang
Haobin Tang
Aolan Sun
Ning Cheng
Jing Xiao
13
1
0
23 Apr 2023
A Comparative Study of Pre-trained Speech and Audio Embeddings for
  Speech Emotion Recognition
A Comparative Study of Pre-trained Speech and Audio Embeddings for Speech Emotion Recognition
Orchid Chetia Phukan
Arun Balaji Buduru
Rajesh Sharma
22
6
0
22 Apr 2023
Denoising Cosine Similarity: A Theory-Driven Approach for Efficient
  Representation Learning
Denoising Cosine Similarity: A Theory-Driven Approach for Efficient Representation Learning
Takumi Nakagawa
Y. Sanada
Hiroki Waida
Yuhui Zhang
Yuichiro Wada
K. Takanashi
Tomonori Yamada
Takafumi Kanamori
DiffM
19
5
0
19 Apr 2023
Looking Similar, Sounding Different: Leveraging Counterfactual
  Cross-Modal Pairs for Audiovisual Representation Learning
Looking Similar, Sounding Different: Leveraging Counterfactual Cross-Modal Pairs for Audiovisual Representation Learning
Nikhil Singh
Chih-Wei Wu
Iroro Orife
Mahdi M. Kalayeh
23
2
0
12 Apr 2023
Wav2code: Restore Clean Speech Representations via Codebook Lookup for
  Noise-Robust ASR
Wav2code: Restore Clean Speech Representations via Codebook Lookup for Noise-Robust ASR
Yuchen Hu
Cheng Chen
Qiu-shi Zhu
E. Chng
18
15
0
11 Apr 2023
Accommodating Audio Modality in CLIP for Multimodal Processing
Accommodating Audio Modality in CLIP for Multimodal Processing
Ludan Ruan
Anwen Hu
Yuqing Song
Liang Zhang
S. Zheng
Qin Jin
VLM
16
10
0
12 Mar 2023
BrainBERT: Self-supervised representation learning for intracranial
  recordings
BrainBERT: Self-supervised representation learning for intracranial recordings
Christopher Wang
Vighnesh Subramaniam
A. Yaari
Gabriel Kreiman
Boris Katz
Ignacio Cases
Andrei Barbu
MedIm
SSL
21
31
0
28 Feb 2023
A Comparison of Speech Data Augmentation Methods Using S3PRL Toolkit
A Comparison of Speech Data Augmentation Methods Using S3PRL Toolkit
Mina Huh
Ruchira Ray
Corey Karnei
14
3
0
27 Feb 2023
Nearest Neighbor-Based Contrastive Learning for Hyperspectral and LiDAR
  Data Classification
Nearest Neighbor-Based Contrastive Learning for Hyperspectral and LiDAR Data Classification
Meng Wang
Feng Gao
Junyu Dong
Hengchao Li
Q. Du
SSL
31
66
0
09 Jan 2023
Efficient Self-supervised Learning with Contextualized Target
  Representations for Vision, Speech and Language
Efficient Self-supervised Learning with Contextualized Target Representations for Vision, Speech and Language
Alexei Baevski
Arun Babu
Wei-Ning Hsu
Michael Auli
VLM
SSL
27
91
0
14 Dec 2022
Progressive Multi-Scale Self-Supervised Learning for Speech Recognition
Progressive Multi-Scale Self-Supervised Learning for Speech Recognition
Genshun Wan
Tan Liu
Hang Chen
Jia-Yu Pan
Cong Liu
Z. Ye
SSL
10
0
0
07 Dec 2022
Deep neural network techniques for monaural speech enhancement: state of
  the art analysis
Deep neural network techniques for monaural speech enhancement: state of the art analysis
P. Ochieng
28
21
0
01 Dec 2022
EURO: ESPnet Unsupervised ASR Open-source Toolkit
EURO: ESPnet Unsupervised ASR Open-source Toolkit
Dongji Gao
Jiatong Shi
Shun-Po Chuang
Leibny Paola García-Perera
Hung-yi Lee
Shinji Watanabe
Sanjeev Khudanpur
19
8
0
30 Nov 2022
VATLM: Visual-Audio-Text Pre-Training with Unified Masked Prediction for
  Speech Representation Learning
VATLM: Visual-Audio-Text Pre-Training with Unified Masked Prediction for Speech Representation Learning
Qiu-shi Zhu
Long Zhou
Zi-Hua Zhang
Shujie Liu
Binxing Jiao
Jie M. Zhang
Lirong Dai
Daxin Jiang
Jinyu Li
Furu Wei
25
37
0
21 Nov 2022
Self-Transriber: Few-shot Lyrics Transcription with Self-training
Self-Transriber: Few-shot Lyrics Transcription with Self-training
Xiaoxue Gao
Xianghu Yue
Haizhou Li
17
7
0
18 Nov 2022
MelHuBERT: A simplified HuBERT on Mel spectrograms
MelHuBERT: A simplified HuBERT on Mel spectrograms
Tzu-Quan Lin
Hung-yi Lee
Hao Tang
SSL
19
13
0
17 Nov 2022
Introducing Semantics into Speech Encoders
Introducing Semantics into Speech Encoders
Derek Xu
Shuyan Dong
Changhan Wang
Suyoun Kim
Zhaojiang Lin
...
Alexei Baevski
Guan-Ting Lin
Hung-yi Lee
Yizhou Sun
Wei Wang
SSL
20
3
0
15 Nov 2022
Improving Children's Speech Recognition by Fine-tuning Self-supervised
  Adult Speech Representations
Improving Children's Speech Recognition by Fine-tuning Self-supervised Adult Speech Representations
Renée Lu
M. Shahin
Beena Ahmed
22
4
0
14 Nov 2022
MT4SSL: Boosting Self-Supervised Speech Representation Learning by
  Integrating Multiple Targets
MT4SSL: Boosting Self-Supervised Speech Representation Learning by Integrating Multiple Targets
Ziyang Ma
Zhisheng Zheng
Changli Tang
Yujin Wang
Xie Chen
20
20
0
14 Nov 2022
A Study on the Integration of Pre-trained SSL, ASR, LM and SLU Models
  for Spoken Language Understanding
A Study on the Integration of Pre-trained SSL, ASR, LM and SLU Models for Spoken Language Understanding
Yifan Peng
Siddhant Arora
Yosuke Higuchi
Yushi Ueda
Sujay S. Kumar
Karthik Ganesan
Siddharth Dalmia
Xuankai Chang
Shinji Watanabe
19
19
0
10 Nov 2022
A Context-Aware Computational Approach for Measuring Vocal Entrainment
  in Dyadic Conversations
A Context-Aware Computational Approach for Measuring Vocal Entrainment in Dyadic Conversations
Rimita Lahiri
Md. Nasir
C. Lord
So Hyun Kim
Shrikanth Narayanan
11
4
0
07 Nov 2022
Evaluation of Automated Speech Recognition Systems for Conversational
  Speech: A Linguistic Perspective
Evaluation of Automated Speech Recognition Systems for Conversational Speech: A Linguistic Perspective
H. Pasandi
Haniyeh B. Pasandi
16
1
0
05 Nov 2022
Improved acoustic-to-articulatory inversion using representations from
  pretrained self-supervised learning models
Improved acoustic-to-articulatory inversion using representations from pretrained self-supervised learning models
Sathvik Udupa
Siddarth C
P. Ghosh
19
7
0
30 Oct 2022
Application of Knowledge Distillation to Multi-task Speech
  Representation Learning
Application of Knowledge Distillation to Multi-task Speech Representation Learning
Mine Kerpicci
V. Nguyen
Shuhua Zhang
Erik M. Visser
20
0
0
29 Oct 2022
Multitask Detection of Speaker Changes, Overlapping Speech and Voice
  Activity Using wav2vec 2.0
Multitask Detection of Speaker Changes, Overlapping Speech and Voice Activity Using wav2vec 2.0
Marie Kunesova
Zbynek Zajíc
SSL
VLM
13
15
0
26 Oct 2022
Audio MFCC-gram Transformers for respiratory insufficiency detection in
  COVID-19
Audio MFCC-gram Transformers for respiratory insufficiency detection in COVID-19
M. Gauy
Marcelo Finger
11
7
0
25 Oct 2022
Improving Speech Representation Learning via Speech-level and
  Phoneme-level Masking Approach
Improving Speech Representation Learning via Speech-level and Phoneme-level Masking Approach
Xulong Zhang
Jianzong Wang
Ning Cheng
Kexin Zhu
Jing Xiao
11
0
0
25 Oct 2022
Combining Contrastive and Non-Contrastive Losses for Fine-Tuning
  Pretrained Models in Speech Analysis
Combining Contrastive and Non-Contrastive Losses for Fine-Tuning Pretrained Models in Speech Analysis
Florian Lux
Ching-Yi Chen
Ngoc Thang Vu
16
1
0
21 Oct 2022
Evidence of Vocal Tract Articulation in Self-Supervised Learning of
  Speech
Evidence of Vocal Tract Articulation in Self-Supervised Learning of Speech
Cheol Jun Cho
Peter Wu
Abdel-rahman Mohamed
Gopala K. Anumanchipalli
21
29
0
21 Oct 2022
SUPERB @ SLT 2022: Challenge on Generalization and Efficiency of
  Self-Supervised Speech Representation Learning
SUPERB @ SLT 2022: Challenge on Generalization and Efficiency of Self-Supervised Speech Representation Learning
Tzu-hsun Feng
Annie Dong
Ching-Feng Yeh
Shu-Wen Yang
Tzu-Quan Lin
...
Xuankai Chang
Shinji Watanabe
Abdel-rahman Mohamed
Shang-Wen Li
Hung-yi Lee
ELM
SSL
19
33
0
16 Oct 2022
CTCBERT: Advancing Hidden-unit BERT with CTC Objectives
CTCBERT: Advancing Hidden-unit BERT with CTC Objectives
Ruchao Fan
Yiming Wang
Yashesh Gaur
Jinyu Li
33
7
0
16 Oct 2022
On the Utility of Self-supervised Models for Prosody-related Tasks
On the Utility of Self-supervised Models for Prosody-related Tasks
Guan-Ting Lin
Chiyu Feng
Wei-Ping Huang
Yuan Tseng
Tzu-Han Lin
Chen An Li
Hung-yi Lee
Nigel G. Ward
16
47
0
13 Oct 2022
Individualized Conditioning and Negative Distances for Speaker
  Separation
Individualized Conditioning and Negative Distances for Speaker Separation
Tao Sun
Nidal Abuhajar
Shuyu Gong
Zhewei Wang
Charles D. Smith
Xianhui Wang
Li Xu
Jundong Liu
VLM
12
1
0
12 Oct 2022
On the Use of Semantically-Aligned Speech Representations for Spoken
  Language Understanding
On the Use of Semantically-Aligned Speech Representations for Spoken Language Understanding
G. Laperriere
Valentin Pelloin
Mickael Rouvier
Themos Stafylakis
Yannick Esteve
27
9
0
11 Oct 2022
CCC-wav2vec 2.0: Clustering aided Cross Contrastive Self-supervised
  learning of speech representations
CCC-wav2vec 2.0: Clustering aided Cross Contrastive Self-supervised learning of speech representations
Vasista Sai Lodagala
Sreyan Ghosh
S. Umesh
SSL
38
18
0
05 Oct 2022
SpeechCLIP: Integrating Speech with Pre-Trained Vision and Language
  Model
SpeechCLIP: Integrating Speech with Pre-Trained Vision and Language Model
Yi-Jen Shih
Hsuan-Fu Wang
Heng-Jui Chang
Layne Berry
Hung-yi Lee
David F. Harwath
VLM
CLIP
38
32
0
03 Oct 2022
Augmentation Invariant Discrete Representation for Generative Spoken
  Language Modeling
Augmentation Invariant Discrete Representation for Generative Spoken Language Modeling
Itai Gat
Felix Kreuk
Tu Nguyen
Ann Lee
Jade Copet
Gabriel Synnaeve
Emmanuel Dupoux
Yossi Adi
20
11
0
30 Sep 2022
SpeechLM: Enhanced Speech Pre-Training with Unpaired Textual Data
SpeechLM: Enhanced Speech Pre-Training with Unpaired Textual Data
Zi-Hua Zhang
Sanyuan Chen
Long Zhou
Yu Wu
Shuo Ren
...
Zhuoyuan Yao
Xun Gong
Lirong Dai
Jinyu Li
Furu Wei
25
54
0
30 Sep 2022
Previous
12345
Next