ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2110.13900
  4. Cited By
WavLM: Large-Scale Self-Supervised Pre-Training for Full Stack Speech
  Processing

WavLM: Large-Scale Self-Supervised Pre-Training for Full Stack Speech Processing

26 October 2021
Sanyuan Chen
Chengyi Wang
Zhengyang Chen
Yu-Huan Wu
Shujie Liu
Zhuo Chen
Jinyu Li
Naoyuki Kanda
Takuya Yoshioka
Xiong Xiao
Jian Wu
Long Zhou
Shuo Ren
Y. Qian
Yao Qian
Jian Wu
Micheal Zeng
Xiangzhan Yu
Furu Wei
    SSL
ArXivPDFHTML

Papers citing "WavLM: Large-Scale Self-Supervised Pre-Training for Full Stack Speech Processing"

50 / 1,021 papers shown
Title
Joint Modelling of Spoken Language Understanding Tasks with Integrated
  Dialog History
Joint Modelling of Spoken Language Understanding Tasks with Integrated Dialog History
Siddhant Arora
Hayato Futami
E. Tsunoo
Brian Yan
Shinji Watanabe
30
4
0
01 May 2023
ChatVideo: A Tracklet-centric Multimodal and Versatile Video
  Understanding System
ChatVideo: A Tracklet-centric Multimodal and Versatile Video Understanding System
Junke Wang
Dongdong Chen
Chong Luo
Xiyang Dai
Lu Yuan
Zuxuan Wu
Yu-Gang Jiang
93
54
0
27 Apr 2023
AI-Synthesized Voice Detection Using Neural Vocoder Artifacts
AI-Synthesized Voice Detection Using Neural Vocoder Artifacts
Chengzhe Sun
Shan Jia
Shuwei Hou
Siwei Lyu
17
38
0
25 Apr 2023
A Comparative Study of Pre-trained Speech and Audio Embeddings for
  Speech Emotion Recognition
A Comparative Study of Pre-trained Speech and Audio Embeddings for Speech Emotion Recognition
Orchid Chetia Phukan
Arun Balaji Buduru
Rajesh Sharma
17
6
0
22 Apr 2023
A Comparison of Semi-Supervised Learning Techniques for Streaming ASR at
  Scale
A Comparison of Semi-Supervised Learning Techniques for Streaming ASR at Scale
Cal Peyser
M. Picheny
Kyunghyun Cho
Rohit Prabhavalkar
Ronny Huang
Tara N. Sainath
AI4TS
14
1
0
19 Apr 2023
NaturalSpeech 2: Latent Diffusion Models are Natural and Zero-Shot
  Speech and Singing Synthesizers
NaturalSpeech 2: Latent Diffusion Models are Natural and Zero-Shot Speech and Singing Synthesizers
Kai Shen
Zeqian Ju
Xu Tan
Yanqing Liu
Yichong Leng
Lei He
Tao Qin
Sheng Zhao
Jiang Bian
DiffM
8
219
0
18 Apr 2023
A Virtual Simulation-Pilot Agent for Training of Air Traffic Controllers
A Virtual Simulation-Pilot Agent for Training of Air Traffic Controllers
Juan Pablo Zuluaga
Amrutha Prasad
Iuliia Nigmatulina
P. Motlícek
Matthias Kleinert
11
21
0
16 Apr 2023
Wav2code: Restore Clean Speech Representations via Codebook Lookup for
  Noise-Robust ASR
Wav2code: Restore Clean Speech Representations via Codebook Lookup for Noise-Robust ASR
Yuchen Hu
Cheng Chen
Qiu-shi Zhu
E. Chng
18
15
0
11 Apr 2023
Pac-HuBERT: Self-Supervised Music Source Separation via Primitive
  Auditory Clustering and Hidden-Unit BERT
Pac-HuBERT: Self-Supervised Music Source Separation via Primitive Auditory Clustering and Hidden-Unit BERT
K. Chen
G. Wichern
Franccois G. Germain
Jonathan Le Roux
AI4TS
12
0
0
04 Apr 2023
Self-supervised Learning with Speech Modulation Dropout
Self-supervised Learning with Speech Modulation Dropout
Samik Sadhu
H. Hermansky
SSL
8
0
0
22 Mar 2023
Transformers in Speech Processing: A Survey
Transformers in Speech Processing: A Survey
S. Latif
Aun Zaidi
Heriberto Cuayáhuitl
Fahad Shamshad
Moazzam Shoukat
Junaid Qadir
33
46
0
21 Mar 2023
Cocktail HuBERT: Generalized Self-Supervised Pre-training for Mixture
  and Single-Source Speech
Cocktail HuBERT: Generalized Self-Supervised Pre-training for Mixture and Single-Source Speech
Maryam Fazel-Zarandi
Wei-Ning Hsu
SSL
14
8
0
20 Mar 2023
Knowledge Distillation from Multiple Foundation Models for End-to-End
  Speech Recognition
Knowledge Distillation from Multiple Foundation Models for End-to-End Speech Recognition
Xiaoyu Yang
Qiujia Li
C. Zhang
P. Woodland
10
6
0
20 Mar 2023
DistillW2V2: A Small and Streaming Wav2vec 2.0 Based ASR Model
DistillW2V2: A Small and Streaming Wav2vec 2.0 Based ASR Model
Yanzhe Fu
Yueteng Kang
Songjun Cao
Long Ma
6
7
0
16 Mar 2023
Evaluating gesture generation in a large-scale open challenge: The GENEA
  Challenge 2022
Evaluating gesture generation in a large-scale open challenge: The GENEA Challenge 2022
Taras Kucherenko
Pieter Wolfert
Youngwoo Yoon
Carla Viegas
Teodor Nikolov
Mihail Tsakov
G. Henter
30
24
0
15 Mar 2023
Leveraging Pretrained Representations with Task-related Keywords for
  Alzheimer's Disease Detection
Leveraging Pretrained Representations with Task-related Keywords for Alzheimer's Disease Detection
Jinchao Li
Kaitao Song
Junan Li
Bo Zheng
Dongsheng Li
Xixin Wu
Xunying Liu
Helen M. Meng
32
12
0
14 Mar 2023
BLAT: Bootstrapping Language-Audio Pre-training based on AudioSet
  Tag-guided Synthetic Data
BLAT: Bootstrapping Language-Audio Pre-training based on AudioSet Tag-guided Synthetic Data
Xuenan Xu
Zhiling Zhang
Zelin Zhou
Pingyue Zhang
Zeyu Xie
Mengyue Wu
Ke Zhu
CLIP
58
14
0
14 Mar 2023
Analysing the Masked predictive coding training criterion for
  pre-training a Speech Representation Model
Analysing the Masked predictive coding training criterion for pre-training a Speech Representation Model
Hemant Yadav
Sunayana Sitaram
R. Shah
SSL
16
4
0
13 Mar 2023
Fine-tuning Strategies for Faster Inference using Speech Self-Supervised
  Models: A Comparative Study
Fine-tuning Strategies for Faster Inference using Speech Self-Supervised Models: A Comparative Study
Salah Zaiem
Robin Algayres
Titouan Parcollet
S. Essid
Mirco Ravanelli
35
14
0
12 Mar 2023
Accommodating Audio Modality in CLIP for Multimodal Processing
Accommodating Audio Modality in CLIP for Multimodal Processing
Ludan Ruan
Anwen Hu
Yuqing Song
Liang Zhang
S. Zheng
Qin Jin
VLM
16
10
0
12 Mar 2023
Exploring Efficient-Tuned Learning Audio Representation Method from
  BriVL
Exploring Efficient-Tuned Learning Audio Representation Method from BriVL
Sen Fang
Yang Wu
Bowen Gao
Jingwen Cai
T. Teoh
DiffM
11
1
0
08 Mar 2023
Self-supervised speech representation learning for keyword-spotting with
  light-weight transformers
Self-supervised speech representation learning for keyword-spotting with light-weight transformers
Chenyang Gao
Yue Gu
Francesco Calivá
Yuzong Liu
OffRL
22
3
0
07 Mar 2023
Speak Foreign Languages with Your Own Voice: Cross-Lingual Neural Codec
  Language Modeling
Speak Foreign Languages with Your Own Voice: Cross-Lingual Neural Codec Language Modeling
Zi-Hua Zhang
Long Zhou
Chengyi Wang
Sanyuan Chen
Yu Wu
...
Huaming Wang
Jinyu Li
Lei He
Sheng Zhao
Furu Wei
VLM
21
170
0
07 Mar 2023
TS-SEP: Joint Diarization and Separation Conditioned on Estimated
  Speaker Embeddings
TS-SEP: Joint Diarization and Separation Conditioned on Estimated Speaker Embeddings
Christoph Boeddeker
Aswin Shanmugam Subramanian
G. Wichern
Reinhold Haeb-Umbach
Jonathan Le Roux
18
22
0
07 Mar 2023
DWFormer: Dynamic Window transFormer for Speech Emotion Recognition
DWFormer: Dynamic Window transFormer for Speech Emotion Recognition
Shuaiqi Chen
Xiaofen Xing
Weibin Zhang
Weidong Chen
Xiangmin Xu
17
15
0
03 Mar 2023
Evaluating Parameter-Efficient Transfer Learning Approaches on SURE
  Benchmark for Speech Understanding
Evaluating Parameter-Efficient Transfer Learning Approaches on SURE Benchmark for Speech Understanding
Yingting Li
Ambuj Mehrish
Shuaijiang Zhao
Rishabh Bhardwaj
Amir Zadeh
Navonil Majumder
Rada Mihalcea
Soujanya Poria
AAML
8
15
0
02 Mar 2023
WhisperX: Time-Accurate Speech Transcription of Long-Form Audio
WhisperX: Time-Accurate Speech Transcription of Long-Form Audio
Max Bain
Jaesung Huh
Tengda Han
Andrew Zisserman
15
198
0
01 Mar 2023
Exploring Self-supervised Pre-trained ASR Models For Dysarthric and
  Elderly Speech Recognition
Exploring Self-supervised Pre-trained ASR Models For Dysarthric and Elderly Speech Recognition
Shujie Hu
Xurong Xie
Zengrui Jin
Mengzhe Geng
Yi Wang
Mingyu Cui
Jiajun Deng
Xunying Liu
Helen M. Meng
11
30
0
28 Feb 2023
Structured Pruning of Self-Supervised Pre-trained Models for Speech
  Recognition and Understanding
Structured Pruning of Self-Supervised Pre-trained Models for Speech Recognition and Understanding
Yifan Peng
Kwangyoun Kim
Felix Wu
Prashant Sridhar
Shinji Watanabe
11
34
0
27 Feb 2023
DST: Deformable Speech Transformer for Emotion Recognition
DST: Deformable Speech Transformer for Emotion Recognition
Weidong Chen
Xiaofen Xing
Xiangmin Xu
Jianxin Pang
Lan Du
17
21
0
27 Feb 2023
Mingling or Misalignment? Temporal Shift for Speech Emotion Recognition
  with Pre-trained Representations
Mingling or Misalignment? Temporal Shift for Speech Emotion Recognition with Pre-trained Representations
Siyuan Shen
Feng Liu
Aimin Zhou
14
15
0
26 Feb 2023
Phone and speaker spatial organization in self-supervised speech
  representations
Phone and speaker spatial organization in self-supervised speech representations
Pablo Riera
M. Cerdeiro
L. Pepino
Luciana Ferrer
SSL
13
1
0
24 Feb 2023
Improving Massively Multilingual ASR With Auxiliary CTC Objectives
Improving Massively Multilingual ASR With Auxiliary CTC Objectives
William Chen
Brian Yan
Jiatong Shi
Yifan Peng
Soumi Maiti
Shinji Watanabe
36
30
0
24 Feb 2023
Towards multi-task learning of speech and speaker recognition
Towards multi-task learning of speech and speaker recognition
Nik Vaessen
David A. van Leeuwen
CVBM
6
0
0
24 Feb 2023
Ensemble knowledge distillation of self-supervised speech models
Ensemble knowledge distillation of self-supervised speech models
Kuan-Po Huang
Tzu-hsun Feng
Yu-Kuan Fu
Tsung-Yuan Hsu
Po-Chieh Yen
Wei-Cheng Tseng
Kai-Wei Chang
Hung-yi Lee
20
16
0
24 Feb 2023
VoxSRC 2022: The Fourth VoxCeleb Speaker Recognition Challenge
VoxSRC 2022: The Fourth VoxCeleb Speaker Recognition Challenge
Jaesung Huh
A. Brown
Jee-weon Jung
Joon Son Chung
Arsha Nagrani
D. Garcia-Romero
Andrew Zisserman
13
26
0
20 Feb 2023
RobustDistiller: Compressing Universal Speech Representations for
  Enhanced Environment Robustness
RobustDistiller: Compressing Universal Speech Representations for Enhanced Environment Robustness
Heitor R. Guimarães
Arthur Pimentel
Anderson R. Avila
Mehdi Rezagholizadeh
Boxing Chen
Tiago H. Falk
32
10
0
18 Feb 2023
Front-End Adapter: Adapting Front-End Input of Speech based
  Self-Supervised Learning for Speech Recognition
Front-End Adapter: Adapting Front-End Input of Speech based Self-Supervised Learning for Speech Recognition
Xie Chen
Ziyang Ma
Changli Tang
Yujin Wang
Zhi-shen Zheng
8
4
0
18 Feb 2023
Speaker Change Detection for Transformer Transducer ASR
Speaker Change Detection for Transformer Transducer ASR
Jian Wu
Zhuo Chen
Min Hu
Xiong Xiao
Jinyu Li
8
4
0
16 Feb 2023
PAAPLoss: A Phonetic-Aligned Acoustic Parameter Loss for Speech
  Enhancement
PAAPLoss: A Phonetic-Aligned Acoustic Parameter Loss for Speech Enhancement
Muqiao Yang
Joseph Konan
David Bick
YUNYANG ZENG
Shuo Han
Anurag Kumar
Shinji Watanabe
Bhiksha Raj
14
5
0
16 Feb 2023
The Framework Tax: Disparities Between Inference Efficiency in NLP
  Research and Deployment
The Framework Tax: Disparities Between Inference Efficiency in NLP Research and Deployment
Jared Fernandez
Jacob Kahn
Clara Na
Yonatan Bisk
Emma Strubell
FedML
10
10
0
13 Feb 2023
ASR Bundestag: A Large-Scale political debate dataset in German
ASR Bundestag: A Large-Scale political debate dataset in German
Johannes Wirth
René Peinl
13
1
0
12 Feb 2023
Improved Decoding of Attentional Selection in Multi-Talker Environments
  with Self-Supervised Learned Speech Representation
Improved Decoding of Attentional Selection in Multi-Talker Environments with Self-Supervised Learned Speech Representation
Cong Han
Vishal B. Choudhari
Yinghao Aaron Li
N. Mesgarani
11
3
0
11 Feb 2023
Cross-Modal Fine-Tuning: Align then Refine
Cross-Modal Fine-Tuning: Align then Refine
Junhong Shen
Liam Li
Lucio Dery
Corey Staten
M. Khodak
Graham Neubig
Ameet Talwalkar
22
33
0
11 Feb 2023
AV-data2vec: Self-supervised Learning of Audio-Visual Speech
  Representations with Contextualized Target Representations
AV-data2vec: Self-supervised Learning of Audio-Visual Speech Representations with Contextualized Target Representations
Jiachen Lian
Alexei Baevski
Wei-Ning Hsu
Michael Auli
SSL
27
32
0
10 Feb 2023
Speak, Read and Prompt: High-Fidelity Text-to-Speech with Minimal
  Supervision
Speak, Read and Prompt: High-Fidelity Text-to-Speech with Minimal Supervision
Eugene Kharitonov
Damien Vincent
Zalan Borsos
Raphaël Marinier
Sertan Girgin
Olivier Pietquin
Matthew Sharifi
Marco Tagliasacchi
Neil Zeghidour
9
189
0
07 Feb 2023
Automatic Sleep Stage Classification with Cross-modal Self-supervised
  Features from Deep Brain Signals
Automatic Sleep Stage Classification with Cross-modal Self-supervised Features from Deep Brain Signals
Chen Gong
Yue Chen
Yanan Sui
Luming Li
14
0
0
07 Feb 2023
Dual Learning for Large Vocabulary On-Device ASR
Dual Learning for Large Vocabulary On-Device ASR
Cal Peyser
Ronny Huang
Tara N. Sainath
Rohit Prabhavalkar
M. Picheny
K. Cho
SSL
11
1
0
11 Jan 2023
Neural Codec Language Models are Zero-Shot Text to Speech Synthesizers
Neural Codec Language Models are Zero-Shot Text to Speech Synthesizers
Chengyi Wang
Sanyuan Chen
Yu-Huan Wu
Zi-Hua Zhang
Long Zhou
...
Huaming Wang
Jinyu Li
Lei He
Sheng Zhao
Furu Wei
16
637
0
05 Jan 2023
EmoGator: A New Open Source Vocal Burst Dataset with Baseline Machine
  Learning Classification Methodologies
EmoGator: A New Open Source Vocal Burst Dataset with Baseline Machine Learning Classification Methodologies
F. Buhl
VLM
11
1
0
02 Jan 2023
Previous
123...161718192021
Next