ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2105.01051
  4. Cited By
SUPERB: Speech processing Universal PERformance Benchmark

SUPERB: Speech processing Universal PERformance Benchmark

3 May 2021
Shu-Wen Yang
Po-Han Chi
Yung-Sung Chuang
Cheng-I Jeff Lai
Kushal Lakhotia
Yist Y. Lin
Andy T. Liu
Jiatong Shi
Xuankai Chang
Guan-Ting Lin
Tzu-hsien Huang
Wei-Cheng Tseng
Ko-tik Lee
Da-Rong Liu
Zili Huang
Shuyan Dong
Shang-Wen Li
Shinji Watanabe
Abdel-rahman Mohamed
Hung-yi Lee
    SSL
ArXivPDFHTML

Papers citing "SUPERB: Speech processing Universal PERformance Benchmark"

50 / 160 papers shown
Title
SUPERB @ SLT 2022: Challenge on Generalization and Efficiency of
  Self-Supervised Speech Representation Learning
SUPERB @ SLT 2022: Challenge on Generalization and Efficiency of Self-Supervised Speech Representation Learning
Tzu-hsun Feng
Annie Dong
Ching-Feng Yeh
Shu-Wen Yang
Tzu-Quan Lin
...
Xuankai Chang
Shinji Watanabe
Abdel-rahman Mohamed
Shang-Wen Li
Hung-yi Lee
ELM
SSL
24
33
0
16 Oct 2022
RedApt: An Adaptor for wav2vec 2 Encoding \\ Faster and Smaller Speech
  Translation without Quality Compromise
RedApt: An Adaptor for wav2vec 2 Encoding \\ Faster and Smaller Speech Translation without Quality Compromise
Jinming Zhao
Haomiao Yang
Gholamreza Haffari
Ehsan Shareghi
VLM
11
2
0
16 Oct 2022
Improving generalizability of distilled self-supervised speech
  processing models under distorted settings
Improving generalizability of distilled self-supervised speech processing models under distorted settings
Kuan-Po Huang
Yu-Kuan Fu
Tsung-Yuan Hsu
Fabian Ritter Gutierrez
Fan Wang
Liang-Hsuan Tseng
Yu Zhang
Hung-yi Lee
24
13
0
14 Oct 2022
On the Utility of Self-supervised Models for Prosody-related Tasks
On the Utility of Self-supervised Models for Prosody-related Tasks
Guan-Ting Lin
Chiyu Feng
Wei-Ping Huang
Yuan Tseng
Tzu-Han Lin
Chen An Li
Hung-yi Lee
Nigel G. Ward
21
47
0
13 Oct 2022
An empirical study of weakly supervised audio tagging embeddings for
  general audio representations
An empirical study of weakly supervised audio tagging embeddings for general audio representations
Heinrich Dinkel
Zhiyong Yan
Yongqing Wang
Junbo Zhang
Yujun Wang
27
1
0
30 Sep 2022
Speech Enhancement Using Self-Supervised Pre-Trained Model and Vector
  Quantization
Speech Enhancement Using Self-Supervised Pre-Trained Model and Vector Quantization
Xiaokang Zhao
Qiu-shi Zhu
Jie M. Zhang
30
4
0
28 Sep 2022
Extending RNN-T-based speech recognition systems with emotion and
  language classification
Extending RNN-T-based speech recognition systems with emotion and language classification
Zvi Kons
Hagai Aronowitz
E. Morais
Matheus Damasceno
H. Kuo
Samuel Thomas
G. Saon
9
5
0
28 Jul 2022
FitHuBERT: Going Thinner and Deeper for Knowledge Distillation of Speech
  Self-Supervised Learning
FitHuBERT: Going Thinner and Deeper for Knowledge Distillation of Speech Self-Supervised Learning
Yeonghyeon Lee
Kangwook Jang
Jahyun Goo
Youngmoon Jung
Hoi-Rim Kim
10
28
0
01 Jul 2022
Toward Low-Cost End-to-End Spoken Language Understanding
Toward Low-Cost End-to-End Spoken Language Understanding
Marco Dinarelli
M. Naguib
Franccois Portet
9
5
0
01 Jul 2022
RetrieverTTS: Modeling Decomposed Factors for Text-Based Speech
  Insertion
RetrieverTTS: Modeling Decomposed Factors for Text-Based Speech Insertion
Dacheng Yin
Chuanxin Tang
Yanqing Liu
Xiaoqiang Wang
Zhiyuan Zhao
Yucheng Zhao
Zhiwei Xiong
Sheng Zhao
Chong Luo
16
12
0
28 Jun 2022
Predicting within and across language phoneme recognition performance of
  self-supervised learning speech pre-trained models
Predicting within and across language phoneme recognition performance of self-supervised learning speech pre-trained models
Han Ji
T. Patel
O. Scharenborg
29
7
0
24 Jun 2022
Comparing supervised and self-supervised embedding for ExVo Multi-Task
  learning track
Comparing supervised and self-supervised embedding for ExVo Multi-Task learning track
Tilak Purohit
Imen Ben Mahmoud
Bogdan Vlasenko
Mathew Magimai.-Doss
SSL
15
8
0
23 Jun 2022
Investigation of Ensemble features of Self-Supervised Pretrained Models
  for Automatic Speech Recognition
Investigation of Ensemble features of Self-Supervised Pretrained Models for Automatic Speech Recognition
Anjana Arunkumar
Vrunda N. Sukhadia
S. Umesh
17
10
0
11 Jun 2022
Self-supervised models of audio effectively explain human cortical
  responses to speech
Self-supervised models of audio effectively explain human cortical responses to speech
Aditya R. Vaidya
Shailee Jain
Alexander G. Huth
15
42
0
27 May 2022
Self-Supervised Speech Representation Learning: A Review
Self-Supervised Speech Representation Learning: A Review
Abdel-rahman Mohamed
Hung-yi Lee
Lasse Borgholt
Jakob Drachmann Havtorn
Joakim Edin
...
Shang-Wen Li
Karen Livescu
Lars Maaløe
Tara N. Sainath
Shinji Watanabe
SSL
AI4TS
124
348
0
21 May 2022
Silence is Sweeter Than Speech: Self-Supervised Model Using Silence to
  Store Speaker Information
Silence is Sweeter Than Speech: Self-Supervised Model Using Silence to Store Speaker Information
Chiyu Feng
Po-Chun Hsu
Hung-yi Lee
SSL
20
8
0
08 May 2022
Wav2Seq: Pre-training Speech-to-Text Encoder-Decoder Models Using Pseudo
  Languages
Wav2Seq: Pre-training Speech-to-Text Encoder-Decoder Models Using Pseudo Languages
Felix Wu
Kwangyoun Kim
Shinji Watanabe
Kyu Jeong Han
Ryan T. McDonald
Kilian Q. Weinberger
Yoav Artzi
SyDa
40
37
0
02 May 2022
ContentVec: An Improved Self-Supervised Speech Representation by
  Disentangling Speakers
ContentVec: An Improved Self-Supervised Speech Representation by Disentangling Speakers
Kaizhi Qian
Yang Zhang
Heting Gao
Junrui Ni
Cheng-I Jeff Lai
David D. Cox
M. Hasegawa-Johnson
Shiyu Chang
DRL
14
110
0
20 Apr 2022
BYOL for Audio: Exploring Pre-trained General-purpose Audio
  Representations
BYOL for Audio: Exploring Pre-trained General-purpose Audio Representations
Daisuke Niizumi
Daiki Takeuchi
Yasunori Ohishi
N. Harada
K. Kashino
SSL
34
53
0
15 Apr 2022
The PartialSpoof Database and Countermeasures for the Detection of Short
  Fake Speech Segments Embedded in an Utterance
The PartialSpoof Database and Countermeasures for the Detection of Short Fake Speech Segments Embedded in an Utterance
Lin Zhang
Xin Wang
Erica Cooper
Nicholas W. D. Evans
Junichi Yamagishi
16
56
0
11 Apr 2022
GigaST: A 10,000-hour Pseudo Speech Translation Corpus
GigaST: A 10,000-hour Pseudo Speech Translation Corpus
Rong Ye
Chengqi Zhao
Tom Ko
Chutong Meng
Tao Wang
Mingxuan Wang
Jun Cao
9
23
0
08 Apr 2022
Automatic Pronunciation Assessment using Self-Supervised Speech
  Representation Learning
Automatic Pronunciation Assessment using Self-Supervised Speech Representation Learning
Eesung Kim
J. Jeon
Hyeji Seo
Ho-Young Kim
SSL
21
37
0
08 Apr 2022
MTI-Net: A Multi-Target Speech Intelligibility Prediction Model
MTI-Net: A Multi-Target Speech Intelligibility Prediction Model
Ryandhimas E. Zezario
Szu-Wei Fu
Fei Chen
C. Fuh
Hsin-Min Wang
Yu Tsao
19
13
0
07 Apr 2022
Enhanced Direct Speech-to-Speech Translation Using Self-supervised
  Pre-training and Data Augmentation
Enhanced Direct Speech-to-Speech Translation Using Self-supervised Pre-training and Data Augmentation
Sravya Popuri
Peng-Jen Chen
Changhan Wang
J. Pino
Yossi Adi
Jiatao Gu
Wei-Ning Hsu
Ann Lee
14
56
0
06 Apr 2022
User-Level Differential Privacy against Attribute Inference Attack of
  Speech Emotion Recognition in Federated Learning
User-Level Differential Privacy against Attribute Inference Attack of Speech Emotion Recognition in Federated Learning
Tiantian Feng
Raghuveer Peri
Shrikanth Narayanan
FedML
11
28
0
05 Apr 2022
Combining Spectral and Self-Supervised Features for Low Resource Speech
  Recognition and Translation
Combining Spectral and Self-Supervised Features for Low Resource Speech Recognition and Translation
Dan Berrebbi
Jiatong Shi
Brian Yan
Osbel López-Francisco
Jonathan D. Amith
Shinji Watanabe
8
26
0
05 Apr 2022
EEND-SS: Joint End-to-End Neural Speaker Diarization and Speech
  Separation for Flexible Number of Speakers
EEND-SS: Joint End-to-End Neural Speaker Diarization and Speech Separation for Flexible Number of Speakers
Soumi Maiti
Yushi Ueda
Shinji Watanabe
Chunlei Zhang
Meng Yu
Shi-Xiong Zhang
Yong-mei Xu
26
32
0
31 Mar 2022
PADA: Pruning Assisted Domain Adaptation for Self-Supervised Speech
  Representations
PADA: Pruning Assisted Domain Adaptation for Self-Supervised Speech Representations
L. D. Prasad
Sreyan Ghosh
S. Umesh
17
12
0
31 Mar 2022
SpeechPrompt: An Exploration of Prompt Tuning on Generative Spoken
  Language Model for Speech Processing Tasks
SpeechPrompt: An Exploration of Prompt Tuning on Generative Spoken Language Model for Speech Processing Tasks
Kai-Wei Chang
Wei-Cheng Tseng
Shang-Wen Li
Hung-yi Lee
17
22
0
31 Mar 2022
Improving Distortion Robustness of Self-supervised Speech Processing
  Tasks with Domain Adaptation
Improving Distortion Robustness of Self-supervised Speech Processing Tasks with Domain Adaptation
Kuan Po Huang
Yuanbin Fu
Yu Zhang
Hung-yi Lee
14
28
0
30 Mar 2022
LightHuBERT: Lightweight and Configurable Speech Representation Learning
  with Once-for-All Hidden-Unit BERT
LightHuBERT: Lightweight and Configurable Speech Representation Learning with Once-for-All Hidden-Unit BERT
Rui Wang
Qibing Bai
Junyi Ao
Long Zhou
Zhixiang Xiong
Zhihua Wei
Yu Zhang
Tom Ko
Haizhou Li
28
61
0
29 Mar 2022
A Speech Representation Anonymization Framework via Selective Noise
  Perturbation
A Speech Representation Anonymization Framework via Selective Noise Perturbation
Minh Tran
M. Soleymani
22
4
0
26 Mar 2022
Audio Self-supervised Learning: A Survey
Audio Self-supervised Learning: A Survey
Shuo Liu
Adria Mallol-Ragolta
Emilia Parada-Cabeleiro
Kun Qian
Xingshuo Jing
Alexander Kathan
Bin Hu
Bjoern W. Schuller
SSL
22
106
0
02 Mar 2022
Measuring the Impact of Individual Domain Factors in Self-Supervised
  Pre-Training
Measuring the Impact of Individual Domain Factors in Self-Supervised Pre-Training
Ramon Sanabria
Wei-Ning Hsu
Alexei Baevski
Michael Auli
11
7
0
01 Mar 2022
Towards a Common Speech Analysis Engine
Towards a Common Speech Analysis Engine
Hagai Aronowitz
Itai Gat
E. Morais
Weizhong Zhu
R. Hoory
12
3
0
01 Mar 2022
Automatic speaker verification spoofing and deepfake detection using
  wav2vec 2.0 and data augmentation
Automatic speaker verification spoofing and deepfake detection using wav2vec 2.0 and data augmentation
Hemlata Tak
Massimiliano Todisco
Xin Wang
Jee-weon Jung
Junichi Yamagishi
Nicholas W. D. Evans
27
151
0
24 Feb 2022
Domain Adaptation of low-resource Target-Domain models using
  well-trained ASR Conformer Models
Domain Adaptation of low-resource Target-Domain models using well-trained ASR Conformer Models
Vrunda N. Sukhadia
S. Umesh
17
8
0
18 Feb 2022
Multimodal Emotion Recognition using Transfer Learning from Speaker
  Recognition and BERT-based models
Multimodal Emotion Recognition using Transfer Learning from Speaker Recognition and BERT-based models
Sarala Padi
S. O. Sadjadi
Dinesh Manocha
Ram D. Sriram
22
35
0
16 Feb 2022
Speaker Normalization for Self-supervised Speech Emotion Recognition
Speaker Normalization for Self-supervised Speech Emotion Recognition
Itai Gat
Hagai Aronowitz
Weizhong Zhu
E. Morais
R. Hoory
25
50
0
02 Feb 2022
IGLUE: A Benchmark for Transfer Learning across Modalities, Tasks, and
  Languages
IGLUE: A Benchmark for Transfer Learning across Modalities, Tasks, and Languages
Emanuele Bugliarello
Fangyu Liu
Jonas Pfeiffer
Siva Reddy
Desmond Elliott
E. Ponti
Ivan Vulić
MLLM
VLM
ELM
35
62
0
27 Jan 2022
Bias in Automated Speaker Recognition
Bias in Automated Speaker Recognition
Wiebke Toussaint
Aaron Yi Ding
CVBM
27
44
0
24 Jan 2022
Self-Supervised Learning for speech recognition with Intermediate layer
  supervision
Self-Supervised Learning for speech recognition with Intermediate layer supervision
Chengyi Wang
Yu-Huan Wu
Sanyuan Chen
Shujie Liu
Jinyu Li
Yao Qian
Zhenglu Yang
SSL
16
28
0
16 Dec 2021
On the Use of External Data for Spoken Named Entity Recognition
On the Use of External Data for Spoken Named Entity Recognition
Ankita Pasad
Felix Wu
Suwon Shon
Karen Livescu
Kyu Jeong Han
32
16
0
14 Dec 2021
ESPnet-SLU: Advancing Spoken Language Understanding through ESPnet
ESPnet-SLU: Advancing Spoken Language Understanding through ESPnet
Siddhant Arora
Siddharth Dalmia
Pavel Denisov
Xuankai Chang
Yushi Ueda
...
Karthik Ganesan
Brian Yan
Ngoc Thang Vu
A. Black
Shinji Watanabe
VLM
23
74
0
29 Nov 2021
Speech Tasks Relevant to Sleepiness Determined with Deep Transfer
  Learning
Speech Tasks Relevant to Sleepiness Determined with Deep Transfer Learning
Bang Tran
Youxiang Zhu
Xiaohui Liang
J. Schwoebel
L. Warrenburg
8
7
0
29 Nov 2021
Towards Learning Universal Audio Representations
Towards Learning Universal Audio Representations
Luyu Wang
Pauline Luc
Yan Wu
Adrià Recasens
Lucas Smaira
...
Andrew Jaegle
Jean-Baptiste Alayrac
Sander Dieleman
João Carreira
Aaron van den Oord
SSL
24
68
0
23 Nov 2021
A Comparison of Discrete and Soft Speech Units for Improved Voice
  Conversion
A Comparison of Discrete and Soft Speech Units for Improved Voice Conversion
Benjamin van Niekerk
M. Carbonneau
Julian Zaïdi
Matthew Baas
Hugo Seuté
Herman Kamper
DRL
11
111
0
03 Nov 2021
WavLM: Large-Scale Self-Supervised Pre-Training for Full Stack Speech
  Processing
WavLM: Large-Scale Self-Supervised Pre-Training for Full Stack Speech Processing
Sanyuan Chen
Chengyi Wang
Zhengyang Chen
Yu-Huan Wu
Shujie Liu
...
Yao Qian
Jian Wu
Micheal Zeng
Xiangzhan Yu
Furu Wei
SSL
73
1,694
0
26 Oct 2021
SSAST: Self-Supervised Audio Spectrogram Transformer
SSAST: Self-Supervised Audio Spectrogram Transformer
Yuan Gong
Cheng-I Jeff Lai
Yu-An Chung
James R. Glass
ViT
30
268
0
19 Oct 2021
Speech Representation Learning Through Self-supervised Pretraining And
  Multi-task Finetuning
Speech Representation Learning Through Self-supervised Pretraining And Multi-task Finetuning
Yi-Chen Chen
Shu-Wen Yang
Cheng-Kuang Lee
Simon See
Hung-yi Lee
SSL
11
12
0
18 Oct 2021
Previous
1234
Next