ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2110.13900
  4. Cited By
WavLM: Large-Scale Self-Supervised Pre-Training for Full Stack Speech
  Processing

WavLM: Large-Scale Self-Supervised Pre-Training for Full Stack Speech Processing

26 October 2021
Sanyuan Chen
Chengyi Wang
Zhengyang Chen
Yu-Huan Wu
Shujie Liu
Zhuo Chen
Jinyu Li
Naoyuki Kanda
Takuya Yoshioka
Xiong Xiao
Jian Wu
Long Zhou
Shuo Ren
Y. Qian
Yao Qian
Jian Wu
Micheal Zeng
Xiangzhan Yu
Furu Wei
    SSL
ArXivPDFHTML

Papers citing "WavLM: Large-Scale Self-Supervised Pre-Training for Full Stack Speech Processing"

50 / 1,021 papers shown
Title
Sample-Efficient Unsupervised Domain Adaptation of Speech Recognition
  Systems A case study for Modern Greek
Sample-Efficient Unsupervised Domain Adaptation of Speech Recognition Systems A case study for Modern Greek
Georgios Paraskevopoulos
Theodoros Kouzelis
Georgios Rouvalis
Athanasios Katsamanis
V. Katsouros
Alexandros Potamianos
VLM
10
7
0
31 Dec 2022
Don't Be So Sure! Boosting ASR Decoding via Confidence Relaxation
Don't Be So Sure! Boosting ASR Decoding via Confidence Relaxation
Tomer Wullach
Shlomo E. Chazan
11
1
0
27 Dec 2022
SLUE Phase-2: A Benchmark Suite of Diverse Spoken Language Understanding
  Tasks
SLUE Phase-2: A Benchmark Suite of Diverse Spoken Language Understanding Tasks
Suwon Shon
Siddhant Arora
Chyi-Jiunn Lin
Ankita Pasad
Felix Wu
Roshan S. Sharma
Wei Yu Wu
Hung-yi Lee
Karen Livescu
Shinji Watanabe
ELM
13
31
0
20 Dec 2022
Exploring Effective Fusion Algorithms for Speech Based Self-Supervised
  Learning Models
Exploring Effective Fusion Algorithms for Speech Based Self-Supervised Learning Models
Changli Tang
Yujin Wang
Xie Chen
Weiqiang Zhang
15
1
0
20 Dec 2022
BEATs: Audio Pre-Training with Acoustic Tokenizers
BEATs: Audio Pre-Training with Acoustic Tokenizers
Sanyuan Chen
Yu-Huan Wu
Chengyi Wang
Shujie Liu
Daniel C. Tompkins
Zhuo Chen
Furu Wei
14
250
0
18 Dec 2022
Efficient Self-supervised Learning with Contextualized Target
  Representations for Vision, Speech and Language
Efficient Self-supervised Learning with Contextualized Target Representations for Vision, Speech and Language
Alexei Baevski
Arun Babu
Wei-Ning Hsu
Michael Auli
VLM
SSL
27
90
0
14 Dec 2022
Speech and Natural Language Processing Technologies for Pseudo-Pilot
  Simulator
Speech and Natural Language Processing Technologies for Pseudo-Pilot Simulator
Amrutha Prasad
Juan Pablo Zuluaga
P. Motlícek
Seyyed Saeed Sarfjoo
Iuliia Nigmatulina
Karel Veselý
15
3
0
14 Dec 2022
Disentangling Prosody Representations with Unsupervised Speech
  Reconstruction
Disentangling Prosody Representations with Unsupervised Speech Reconstruction
Leyuan Qu
Taiha Li
C. Weber
Theresa Pekarek-Rosin
F. Ren
S. Wermter
8
8
0
14 Dec 2022
TencentPretrain: A Scalable and Flexible Toolkit for Pre-training Models
  of Different Modalities
TencentPretrain: A Scalable and Flexible Toolkit for Pre-training Models of Different Modalities
Zhe Zhao
Yudong Li
Cheng-An Hou
Jing-xin Zhao
Rong Tian
...
Xingwu Sun
Zhanhui Kang
Xiaoyong Du
Linlin Shen
Kimmo Yan
VLM
21
16
0
13 Dec 2022
Masked Lip-Sync Prediction by Audio-Visual Contextual Exploitation in
  Transformers
Masked Lip-Sync Prediction by Audio-Visual Contextual Exploitation in Transformers
Yasheng Sun
Hang Zhou
Kaisiyuan Wang
Qianyi Wu
Zhibin Hong
Jingtuo Liu
Errui Ding
Jingdong Wang
Ziwei Liu
Koike Hideki
19
33
0
09 Dec 2022
Improved Speech Pre-Training with Supervision-Enhanced Acoustic Unit
Improved Speech Pre-Training with Supervision-Enhanced Acoustic Unit
Pengcheng Li
Genshun Wan
Fenglin Ding
Hang Chen
Jianqing Gao
Jia-Yu Pan
Cong Liu
SSL
17
1
0
07 Dec 2022
Improved Self-Supervised Multilingual Speech Representation Learning
  Combined with Auxiliary Language Information
Improved Self-Supervised Multilingual Speech Representation Learning Combined with Auxiliary Language Information
Fenglin Ding
Genshun Wan
Pengcheng Li
Jia-Yu Pan
Cong Liu
SSL
11
1
0
07 Dec 2022
Label-free Knowledge Distillation with Contrastive Loss for Light-weight
  Speaker Recognition
Label-free Knowledge Distillation with Contrastive Loss for Light-weight Speaker Recognition
Zhiyuan Peng
Xuanji He
Ke Ding
Tan Lee
Guanglu Wan
10
3
0
06 Dec 2022
Parameter Efficient Transfer Learning for Various Speech Processing
  Tasks
Parameter Efficient Transfer Learning for Various Speech Processing Tasks
Shinta Otake
Rei Kawakami
Nakamasa Inoue
11
16
0
06 Dec 2022
EURO: ESPnet Unsupervised ASR Open-source Toolkit
EURO: ESPnet Unsupervised ASR Open-source Toolkit
Dongji Gao
Jiatong Shi
Shun-Po Chuang
Leibny Paola García-Perera
Hung-yi Lee
Shinji Watanabe
Sanjeev Khudanpur
11
8
0
30 Nov 2022
MMSpeech: Multi-modal Multi-task Encoder-Decoder Pre-training for Speech
  Recognition
MMSpeech: Multi-modal Multi-task Encoder-Decoder Pre-training for Speech Recognition
Xiaohuan Zhou
Jiaming Wang
Zeyu Cui
Shiliang Zhang
Zhijie Yan
Jingren Zhou
Chang Zhou
20
12
0
29 Nov 2022
Model Extraction Attack against Self-supervised Speech Models
Model Extraction Attack against Self-supervised Speech Models
Tsung-Yuan Hsu
Chen An Li
Tung-Yu Wu
Hung-yi Lee
11
1
0
29 Nov 2022
TESSP: Text-Enhanced Self-Supervised Speech Pre-training
TESSP: Text-Enhanced Self-Supervised Speech Pre-training
Zhuoyuan Yao
Shuo Ren
Sanyuan Chen
Ziyang Ma
Pengcheng Guo
Linfu Xie
11
5
0
24 Nov 2022
Device Directedness with Contextual Cues for Spoken Dialog Systems
Device Directedness with Contextual Cues for Spoken Dialog Systems
Dhanush Bekal
S. Srinivasan
S. Bodapati
S. Ronanki
Katrin Kirchhoff
21
1
0
23 Nov 2022
VATLM: Visual-Audio-Text Pre-Training with Unified Masked Prediction for
  Speech Representation Learning
VATLM: Visual-Audio-Text Pre-Training with Unified Masked Prediction for Speech Representation Learning
Qiu-shi Zhu
Long Zhou
Zi-Hua Zhang
Shujie Liu
Binxing Jiao
Jie M. Zhang
Lirong Dai
Daxin Jiang
Jinyu Li
Furu Wei
22
37
0
21 Nov 2022
Exploring WavLM on Speech Enhancement
Exploring WavLM on Speech Enhancement
Hyungchan Song
Sanyuan Chen
Zhuo Chen
Yu-Huan Wu
Takuya Yoshioka
M. Tang
Jong Won Shin
Shujie Liu
8
16
0
18 Nov 2022
Compressing Transformer-based self-supervised models for speech
  processing
Compressing Transformer-based self-supervised models for speech processing
Tzu-Quan Lin
Tsung-Huan Yang
Chun-Yao Chang
Kuang-Ming Chen
Tzu-hsun Feng
Hung-yi Lee
Hao Tang
12
6
0
17 Nov 2022
MelHuBERT: A simplified HuBERT on Mel spectrograms
MelHuBERT: A simplified HuBERT on Mel spectrograms
Tzu-Quan Lin
Hung-yi Lee
Hao Tang
SSL
14
13
0
17 Nov 2022
Improving Speech Emotion Recognition with Unsupervised Speaking Style
  Transfer
Improving Speech Emotion Recognition with Unsupervised Speaking Style Transfer
Leyuan Qu
Wei Wang
C. Weber
F. Ren
Taiha Li
S. Wermter
11
1
0
16 Nov 2022
MT4SSL: Boosting Self-Supervised Speech Representation Learning by
  Integrating Multiple Targets
MT4SSL: Boosting Self-Supervised Speech Representation Learning by Integrating Multiple Targets
Ziyang Ma
Zhisheng Zheng
Changli Tang
Yujin Wang
Xie Chen
15
20
0
14 Nov 2022
Handling Trade-Offs in Speech Separation with Sparsely-Gated Mixture of
  Experts
Handling Trade-Offs in Speech Separation with Sparsely-Gated Mixture of Experts
Xiaofei Wang
Zhuo Chen
Yu Shi
Jian Wu
Naoyuki Kanda
Takuya Yoshioka
MoE
14
1
0
11 Nov 2022
A Study on the Integration of Pre-trained SSL, ASR, LM and SLU Models
  for Spoken Language Understanding
A Study on the Integration of Pre-trained SSL, ASR, LM and SLU Models for Spoken Language Understanding
Yifan Peng
Siddhant Arora
Yosuke Higuchi
Yushi Ueda
Sujay S. Kumar
Karthik Ganesan
Siddharth Dalmia
Xuankai Chang
Shinji Watanabe
11
19
0
10 Nov 2022
Self-supervised learning with bi-label masked speech prediction for
  streaming multi-talker speech recognition
Self-supervised learning with bi-label masked speech prediction for streaming multi-talker speech recognition
Zili Huang
Zhuo Chen
Naoyuki Kanda
Jian Wu
Yiming Wang
Jinyu Li
Takuya Yoshioka
Xiaofei Wang
Peidong Wang
12
3
0
10 Nov 2022
Speech separation with large-scale self-supervised learning
Speech separation with large-scale self-supervised learning
Zhuo Chen
Naoyuki Kanda
Jian Wu
Yu-Huan Wu
Xiaofei Wang
Takuya Yoshioka
Jinyu Li
S. Sivasankaran
Sefik Emre Eskimez
14
13
0
09 Nov 2022
Distribution-based Emotion Recognition in Conversation
Distribution-based Emotion Recognition in Conversation
Wen Wu
C. Zhang
P. Woodland
19
4
0
09 Nov 2022
Comparative layer-wise analysis of self-supervised speech models
Comparative layer-wise analysis of self-supervised speech models
Ankita Pasad
Bowen Shi
Karen Livescu
SSL
19
109
0
08 Nov 2022
Integrating Voice-Based Machine Learning Technology into Complex Home
  Environments
Integrating Voice-Based Machine Learning Technology into Complex Home Environments
Ye Gao
Jason J. Jabbour
Eun-Jung Ko
L. Wijayasingha
Sooyoung Kim
...
Meiyi Ma
Karen Rose
Kristin D. Gordon
Hongning Wang
John A. Stankovic
13
1
0
06 Nov 2022
Evaluation of Automated Speech Recognition Systems for Conversational
  Speech: A Linguistic Perspective
Evaluation of Automated Speech Recognition Systems for Conversational Speech: A Linguistic Perspective
H. Pasandi
Haniyeh B. Pasandi
11
1
0
05 Nov 2022
Self-Supervised Learning for Speech Enhancement through Synthesis
Self-Supervised Learning for Speech Enhancement through Synthesis
Bryce Irvin
Marko Stamenovic
M. Kegler
Li-Chia Yang
27
18
0
04 Nov 2022
Biased Self-supervised learning for ASR
Biased Self-supervised learning for ASR
Florian Kreyssig
Yangyang Shi
Jinxi Guo
Leda Sari
Abdel-rahman Mohamed
P. Woodland
SSL
14
2
0
04 Nov 2022
Once-for-All Sequence Compression for Self-Supervised Speech Models
Once-for-All Sequence Compression for Self-Supervised Speech Models
Hsuan-Jui Chen
Yen Meng
Hung-yi Lee
6
4
0
04 Nov 2022
Speech-based emotion recognition with self-supervised models using
  attentive channel-wise correlations and label smoothing
Speech-based emotion recognition with self-supervised models using attentive channel-wise correlations and label smoothing
Sofoklis Kakouros
Themos Stafylakis
Ladislav Mošner
L. Burget
16
16
0
03 Nov 2022
data2vec-aqc: Search for the right Teaching Assistant in the
  Teacher-Student training setup
data2vec-aqc: Search for the right Teaching Assistant in the Teacher-Student training setup
Vasista Sai Lodagala
Sreyan Ghosh
S. Umesh
SSL
25
3
0
02 Nov 2022
Inference and Denoise: Causal Inference-based Neural Speech Enhancement
Inference and Denoise: Causal Inference-based Neural Speech Enhancement
Tsun-An Hsieh
Chao-Han Huck Yang
Pin-Yu Chen
Sabato Marco Siniscalchi
Yu Tsao
CML
37
2
0
02 Nov 2022
Avoid Overthinking in Self-Supervised Models for Speech Recognition
Avoid Overthinking in Self-Supervised Models for Speech Recognition
Dan Berrebbi
Brian Yan
Shinji Watanabe
LRM
9
4
0
01 Nov 2022
Adapting self-supervised models to multi-talker speech recognition using
  speaker embeddings
Adapting self-supervised models to multi-talker speech recognition using speaker embeddings
Zili Huang
Desh Raj
Leibny Paola García-Perera
Sanjeev Khudanpur
73
21
0
01 Nov 2022
Speech-text based multi-modal training with bidirectional attention for
  improved speech recognition
Speech-text based multi-modal training with bidirectional attention for improved speech recognition
Yuhang Yang
Haihua Xu
Hao-Ming Huang
E. Chng
Sheng Li
32
7
0
01 Nov 2022
Predicting Multi-Codebook Vector Quantization Indexes for Knowledge
  Distillation
Predicting Multi-Codebook Vector Quantization Indexes for Knowledge Distillation
Liyong Guo
Xiaoyu Yang
Quandong Wang
Yuxiang Kong
Zengwei Yao
...
Wei Kang
Long Lin
Mingshuang Luo
Piotr Żelasko
Daniel Povey
VLM
17
7
0
31 Oct 2022
token2vec: A Joint Self-Supervised Pre-training Framework Using Unpaired
  Speech and Text
token2vec: A Joint Self-Supervised Pre-training Framework Using Unpaired Speech and Text
Xianghu Yue
Junyi Ao
Xiaoxue Gao
Haizhou Li
SSL
19
8
0
30 Oct 2022
Articulatory Representation Learning Via Joint Factor Analysis and
  Neural Matrix Factorization
Articulatory Representation Learning Via Joint Factor Analysis and Neural Matrix Factorization
Jiachen Lian
A. Black
Yijingxiu Lu
L. Goldstein
Shinji Watanabe
Gopala K. Anumanchipalli
25
14
0
29 Oct 2022
Universal speaker recognition encoders for different speech segments
  duration
Universal speaker recognition encoders for different speech segments duration
Sergey Novoselov
V. Volokhov
G. Lavrentyeva
4
2
0
28 Oct 2022
Parameter-efficient transfer learning of pre-trained Transformer models
  for speaker verification using adapters
Parameter-efficient transfer learning of pre-trained Transformer models for speaker verification using adapters
Junyi Peng
Themos Stafylakis
Rongzhi Gu
Oldvrich Plchot
Ladislav Movsner
Lukávs Burget
JanHonza'' vCernocký
29
22
0
28 Oct 2022
On the Use of Modality-Specific Large-Scale Pre-Trained Encoders for
  Multimodal Sentiment Analysis
On the Use of Modality-Specific Large-Scale Pre-Trained Encoders for Multimodal Sentiment Analysis
Atsushi Ando
Ryo Masumura
Akihiko Takashima
Satoshi Suzuki
Naoki Makishima
Keita Suzuki
Takafumi Moriya
Takanori Ashihara
Hiroshi Sato
26
9
0
28 Oct 2022
Evaluating context-invariance in unsupervised speech representations
Evaluating context-invariance in unsupervised speech representations
Mark Hallap
Emmanuel Dupoux
Ewan Dunbar
SSL
23
9
0
27 Oct 2022
Self-supervised language learning from raw audio: Lessons from the Zero
  Resource Speech Challenge
Self-supervised language learning from raw audio: Lessons from the Zero Resource Speech Challenge
Ewan Dunbar
Nicolas Hamilakis
Emmanuel Dupoux
SSL
19
30
0
27 Oct 2022
Previous
123...1718192021
Next