ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2110.13900
  4. Cited By
WavLM: Large-Scale Self-Supervised Pre-Training for Full Stack Speech
  Processing

WavLM: Large-Scale Self-Supervised Pre-Training for Full Stack Speech Processing

26 October 2021
Sanyuan Chen
Chengyi Wang
Zhengyang Chen
Yu-Huan Wu
Shujie Liu
Zhuo Chen
Jinyu Li
Naoyuki Kanda
Takuya Yoshioka
Xiong Xiao
Jian Wu
Long Zhou
Shuo Ren
Y. Qian
Yao Qian
Jian Wu
Micheal Zeng
Xiangzhan Yu
Furu Wei
    SSL
ArXivPDFHTML

Papers citing "WavLM: Large-Scale Self-Supervised Pre-Training for Full Stack Speech Processing"

50 / 1,022 papers shown
Title
Leveraging Diverse Semantic-based Audio Pretrained Models for Singing
  Voice Conversion
Leveraging Diverse Semantic-based Audio Pretrained Models for Singing Voice Conversion
Xueyao Zhang
Yicheng Gu
Haopeng Chen
Zihao Fang
Lexiao Zou
Junan Zhang
Liumeng Xue
Jinchao Zhang
Jie Zhou
Zhizheng Wu
DiffM
17
1
0
17 Oct 2023
Spatial HuBERT: Self-supervised Spatial Speech Representation Learning
  for a Single Talker from Multi-channel Audio
Spatial HuBERT: Self-supervised Spatial Speech Representation Learning for a Single Talker from Multi-channel Audio
Antoni Dimitriadis
Siqi Pan
V. Sethu
Beena Ahmed
SSL
12
3
0
17 Oct 2023
Optimized Tokenization for Transcribed Error Correction
Optimized Tokenization for Transcribed Error Correction
Tomer Wullach
Shlomo E. Chazan
19
0
0
16 Oct 2023
SD-HuBERT: Sentence-Level Self-Distillation Induces Syllabic Organization in HuBERT
SD-HuBERT: Sentence-Level Self-Distillation Induces Syllabic Organization in HuBERT
Cheol Jun Cho
Abdelrahman Mohamed
Shang-Wen Li
Alan W. Black
Gopala K. Anumanchipalli
20
8
0
16 Oct 2023
Toward Joint Language Modeling for Speech Units and Text
Toward Joint Language Modeling for Speech Units and Text
Ju-Chieh Chou
Chung-Ming Chien
Wei-Ning Hsu
Karen Livescu
Arun Babu
Alexis Conneau
Alexei Baevski
Michael Auli
VLM
23
19
0
12 Oct 2023
Fast Word Error Rate Estimation Using Self-Supervised Representations for Speech and Text
Fast Word Error Rate Estimation Using Self-Supervised Representations for Speech and Text
Chanho Park
Chengsong Lu
Mingjie Chen
Thomas Hain
15
3
0
12 Oct 2023
Voice Conversion for Stuttered Speech, Instruments, Unseen Languages and
  Textually Described Voices
Voice Conversion for Stuttered Speech, Instruments, Unseen Languages and Textually Described Voices
Matthew Baas
Herman Kamper
13
3
0
12 Oct 2023
Vec-Tok Speech: speech vectorization and tokenization for neural speech
  generation
Vec-Tok Speech: speech vectorization and tokenization for neural speech generation
Xinfa Zhu
Yuanjun Lv
Yinjiao Lei
Tao Li
Wendi He
Hongbin Zhou
Heng Lu
Lei Xie
29
16
0
11 Oct 2023
Findings of the 2023 ML-SUPERB Challenge: Pre-Training and Evaluation over More Languages and Beyond
Findings of the 2023 ML-SUPERB Challenge: Pre-Training and Evaluation over More Languages and Beyond
Jiatong Shi
William Chen
Dan Berrebbi
Hsiu-Hsuan Wang
Wei-Ping Huang
...
Yuxun Tang
Shang-Wen Li
Abdelrahman Mohamed
Hung-yi Lee
Shinji Watanabe
LRM
ELM
34
15
0
09 Oct 2023
XLS-R fine-tuning on noisy word boundaries for unsupervised speech
  segmentation into words
XLS-R fine-tuning on noisy word boundaries for unsupervised speech segmentation into words
Robin Algayres
Pablo Diego-Simon
Benoît Sagot
Emmanuel Dupoux
28
1
0
08 Oct 2023
A Comparative Study of Voice Conversion Models with Large-Scale Speech
  and Singing Data: The T13 Systems for the Singing Voice Conversion Challenge
  2023
A Comparative Study of Voice Conversion Models with Large-Scale Speech and Singing Data: The T13 Systems for the Singing Voice Conversion Challenge 2023
Ryuichi Yamamoto
Reo Yoneyama
Lester Phillip Violeta
Wen-Chin Huang
T. Toda
8
7
0
08 Oct 2023
SALT: Distinguishable Speaker Anonymization Through Latent Space
  Transformation
SALT: Distinguishable Speaker Anonymization Through Latent Space Transformation
Yuanjun Lv
Jixun Yao
Peikun Chen
Hongbin Zhou
Heng Lu
Lei Xie
25
4
0
08 Oct 2023
LauraGPT: Listen, Attend, Understand, and Regenerate Audio with GPT
LauraGPT: Listen, Attend, Understand, and Regenerate Audio with GPT
Zhihao Du
Jiaming Wang
Qian Chen
Yunfei Chu
Zhifu Gao
...
Wen Wang
Siqi Zheng
Chang Zhou
Zhijie Yan
Shiliang Zhang
LLMAG
VLM
AuLLM
LM&MA
31
79
0
07 Oct 2023
Transferring speech-generic and depression-specific knowledge for
  Alzheimer's disease detection
Transferring speech-generic and depression-specific knowledge for Alzheimer's disease detection
Ziyun Cui
Wen Wu
Wei-Qiang Zhang
Ji Wu
Chao Zhang
15
2
0
06 Oct 2023
HuBERTopic: Enhancing Semantic Representation of HuBERT through
  Self-supervision Utilizing Topic Model
HuBERTopic: Enhancing Semantic Representation of HuBERT through Self-supervision Utilizing Topic Model
Takashi Maekaku
Jiatong Shi
Xuankai Chang
Yuya Fujita
Shinji Watanabe
23
1
0
06 Oct 2023
EFFUSE: Efficient Self-Supervised Feature Fusion for E2E ASR in Low
  Resource and Multilingual Scenarios
EFFUSE: Efficient Self-Supervised Feature Fusion for E2E ASR in Low Resource and Multilingual Scenarios
Tejes Srivastava
Jiatong Shi
William Chen
Shinji Watanabe
32
1
0
05 Oct 2023
Prompting and Adapter Tuning for Self-supervised Encoder-Decoder Speech
  Model
Prompting and Adapter Tuning for Self-supervised Encoder-Decoder Speech Model
Kai-Wei Chang
Ming-Hsin Chen
Yun-Ping Lin
Jing Neng Hsu
Paul Kuo-Ming Huang
Chien-yu Huang
Shang-Wen Li
Hung-yi Lee
21
6
0
04 Oct 2023
Multi-resolution HuBERT: Multi-resolution Speech Self-Supervised
  Learning with Masked Unit Prediction
Multi-resolution HuBERT: Multi-resolution Speech Self-Supervised Learning with Masked Unit Prediction
Jiatong Shi
H. Inaguma
Xutai Ma
Ilia Kulikov
Anna Y. Sun
43
24
0
04 Oct 2023
Audio-visual child-adult speaker classification in dyadic interactions
Audio-visual child-adult speaker classification in dyadic interactions
Anfeng Xu
Kevin Huang
Tiantian Feng
Helen Tager-Flusberg
Shrikanth Narayanan
6
3
0
03 Oct 2023
One model to rule them all ? Towards End-to-End Joint Speaker
  Diarization and Speech Recognition
One model to rule them all ? Towards End-to-End Joint Speaker Diarization and Speech Recognition
Samuele Cornell
Jee-weon Jung
Shinji Watanabe
S. Squartini
VLM
12
15
0
02 Oct 2023
It HAS to be Subjective: Human Annotator Simulation via Zero-shot
  Density Estimation
It HAS to be Subjective: Human Annotator Simulation via Zero-shot Density Estimation
Wen Wu
Wenlin Chen
C. Zhang
P. Woodland
13
1
0
30 Sep 2023
AfriSpeech-200: Pan-African Accented Speech Dataset for Clinical and
  General Domain ASR
AfriSpeech-200: Pan-African Accented Speech Dataset for Clinical and General Domain ASR
Tobi Olatunji
Tejumade Afonja
Aditya Yadavalli
Chris C. Emezue
Sahib Singh
...
Joanne I. Osuchukwu
Salomey Osei
A. Tonja
Naome A. Etori
Clinton Mbataku
17
13
0
30 Sep 2023
Improving Audio Captioning Models with Fine-grained Audio Features, Text
  Embedding Supervision, and LLM Mix-up Augmentation
Improving Audio Captioning Models with Fine-grained Audio Features, Text Embedding Supervision, and LLM Mix-up Augmentation
Shih-Lun Wu
Xuankai Chang
G. Wichern
Jee-weon Jung
Franccois G. Germain
Jonathan Le Roux
Shinji Watanabe
10
16
0
29 Sep 2023
Low-Resource Self-Supervised Learning with SSL-Enhanced TTS
Low-Resource Self-Supervised Learning with SSL-Enhanced TTS
Xin Wang
Taein Kwon
Wei-Ning Hsu
Yossi Adi
Tu Nguyen
D. Bohus
Emmanuel Dupoux
Neel Joshi
Abdelrahman Mohamed
10
4
0
29 Sep 2023
Meeting Recognition with Continuous Speech Separation and
  Transcription-Supported Diarization
Meeting Recognition with Continuous Speech Separation and Transcription-Supported Diarization
Thilo von Neumann
Christoph Boeddeker
Tobias Cord-Landwehr
Marc Delcroix
Reinhold Haeb-Umbach
16
7
0
28 Sep 2023
Cross-Modal Multi-Tasking for Speech-to-Text Translation via Hard
  Parameter Sharing
Cross-Modal Multi-Tasking for Speech-to-Text Translation via Hard Parameter Sharing
B. Grimstad
Xuankai Chang
Antonios Anastasopoulos
Yuya Fujita
Shinji Watanabe
13
2
0
27 Sep 2023
Exploring Speech Recognition, Translation, and Understanding with
  Discrete Speech Units: A Comparative Study
Exploring Speech Recognition, Translation, and Understanding with Discrete Speech Units: A Comparative Study
Xuankai Chang
Brian Yan
Kwanghee Choi
Jee-weon Jung
Yichen Lu
...
Pengcheng Guo
Yao-Fei Cheng
Pavel Denisov
Kohei Saijo
Hsiu-Hsuan Wang
26
36
0
27 Sep 2023
HyPoradise: An Open Baseline for Generative Speech Recognition with
  Large Language Models
HyPoradise: An Open Baseline for Generative Speech Recognition with Large Language Models
Cheng Chen
Yuchen Hu
Chao-Han Huck Yang
Sabato Marco Siniscalchi
Pin-Yu Chen
E. Chng
21
42
0
27 Sep 2023
Joint Prediction and Denoising for Large-scale Multilingual
  Self-supervised Learning
Joint Prediction and Denoising for Large-scale Multilingual Self-supervised Learning
William Chen
Jiatong Shi
Brian Yan
Dan Berrebbi
Wangyou Zhang
Yifan Peng
Xuankai Chang
Soumi Maiti
Shinji Watanabe
24
8
0
26 Sep 2023
Emphasized Non-Target Speaker Knowledge in Knowledge Distillation for
  Automatic Speaker Verification
Emphasized Non-Target Speaker Knowledge in Knowledge Distillation for Automatic Speaker Verification
Duc-Tuan Truong
Ruijie Tao
J. Yip
Kong Aik Lee
Chng Eng Siong
11
6
0
26 Sep 2023
Online Active Learning For Sound Event Detection
Online Active Learning For Sound Event Detection
Mark Lindsey
Ankit Shah
Francis Kubala
R. M. Stern
17
0
0
25 Sep 2023
Joint Audio and Speech Understanding
Joint Audio and Speech Understanding
Yuan Gong
Alexander H. Liu
Hongyin Luo
Leonid Karlinsky
James R. Glass
AuLLM
21
66
0
25 Sep 2023
Unsupervised Accent Adaptation Through Masked Language Model Correction
  Of Discrete Self-Supervised Speech Units
Unsupervised Accent Adaptation Through Masked Language Model Correction Of Discrete Self-Supervised Speech Units
Jakob Poncelet
Hugo Van hamme
16
3
0
25 Sep 2023
AutoPrep: An Automatic Preprocessing Framework for In-the-Wild Speech
  Data
AutoPrep: An Automatic Preprocessing Framework for In-the-Wild Speech Data
Jianwei Yu
Hangting Chen
Yanyao Bian
Xiang Li
Yimin Luo
Jinchuan Tian
Mengyang Liu
Jiayi Jiang
Shuai Wang
VLM
13
12
0
25 Sep 2023
Fast-HuBERT: An Efficient Training Framework for Self-Supervised Speech
  Representation Learning
Fast-HuBERT: An Efficient Training Framework for Self-Supervised Speech Representation Learning
Guan-lin Yang
Ziyang Ma
Zhisheng Zheng
Ya-Zhen Song
Zhikang Niu
Xie Chen
25
8
0
25 Sep 2023
Human Transcription Quality Improvement
Human Transcription Quality Improvement
Jian Gao
Hanbo Sun
Cheng Cao
Zheng Du
27
2
0
24 Sep 2023
VIC-KD: Variance-Invariance-Covariance Knowledge Distillation to Make
  Keyword Spotting More Robust Against Adversarial Attacks
VIC-KD: Variance-Invariance-Covariance Knowledge Distillation to Make Keyword Spotting More Robust Against Adversarial Attacks
Heitor R. Guimarães
Arthur Pimentel
Anderson R. Avila
Tiago H. Falk
AAML
13
0
0
22 Sep 2023
Big model only for hard audios: Sample dependent Whisper model selection
  for efficient inferences
Big model only for hard audios: Sample dependent Whisper model selection for efficient inferences
Hugo Malard
Salah Zaiem
Robin Algayres
23
2
0
22 Sep 2023
NTT speaker diarization system for CHiME-7: multi-domain,
  multi-microphone End-to-end and vector clustering diarization
NTT speaker diarization system for CHiME-7: multi-domain, multi-microphone End-to-end and vector clustering diarization
Naohiro Tawara
Marc Delcroix
Atsushi Ando
A. Ogawa
38
7
0
22 Sep 2023
Audio Contrastive based Fine-tuning
Audio Contrastive based Fine-tuning
Yang Wang
Qibin Liang
Chenghao Xiao
Yizhi Li
Noura Al Moubayed
Chenghua Lin
8
0
0
21 Sep 2023
Leveraging In-the-Wild Data for Effective Self-Supervised Pretraining in
  Speaker Recognition
Leveraging In-the-Wild Data for Effective Self-Supervised Pretraining in Speaker Recognition
Shuai Wang
Qibing Bai
Qi Liu
Jianwei Yu
Zhengyang Chen
Bing Han
Yan-min Qian
Haizhou Li
19
1
0
21 Sep 2023
Leveraging Data Collection and Unsupervised Learning for Code-switched
  Tunisian Arabic Automatic Speech Recognition
Leveraging Data Collection and Unsupervised Learning for Code-switched Tunisian Arabic Automatic Speech Recognition
Ahmed Amine Ben Abdallah
Ata Kabboudi
Amir Kanoun
Salah Zaiem
14
1
0
20 Sep 2023
Discrete Audio Representation as an Alternative to Mel-Spectrograms for
  Speaker and Speech Recognition
Discrete Audio Representation as an Alternative to Mel-Spectrograms for Speaker and Speech Recognition
Krishna C. Puvvada
Nithin Rao Koluguri
Kunal Dhawan
Jagadeesh Balam
Boris Ginsburg
19
12
0
19 Sep 2023
AV-SUPERB: A Multi-Task Evaluation Benchmark for Audio-Visual
  Representation Models
AV-SUPERB: A Multi-Task Evaluation Benchmark for Audio-Visual Representation Models
Yuan Tseng
Layne Berry
Yi-Ting Chen
I-Hsiang Chiu
Hsuan-Hao Lin
...
Yu Tsao
Shinji Watanabe
Abdel-rahman Mohamed
Chi-Luen Feng
Hung-yi Lee
VLM
SSL
44
14
0
19 Sep 2023
Leveraging Speech PTM, Text LLM, and Emotional TTS for Speech Emotion
  Recognition
Leveraging Speech PTM, Text LLM, and Emotional TTS for Speech Emotion Recognition
Ziyang Ma
Wen Wu
Zhisheng Zheng
Yiwei Guo
Qian Chen
Shiliang Zhang
Xie Chen
21
14
0
19 Sep 2023
HTEC: Human Transcription Error Correction
HTEC: Human Transcription Error Correction
Hanbo Sun
Jian Gao
Xiaomin Wu
Anjie Fang
Cheng Cao
Zheng Du
19
1
0
18 Sep 2023
A Multitask Training Approach to Enhance Whisper with Contextual Biasing
  and Open-Vocabulary Keyword Spotting
A Multitask Training Approach to Enhance Whisper with Contextual Biasing and Open-Vocabulary Keyword Spotting
Yuang Li
Min Zhang
Chang Su
Yinglu Li
Xiaosong Qiao
Mengxin Ren
Miaomiao Ma
Daimeng Wei
Shimin Tao
Hao-Yu Yang
17
5
0
18 Sep 2023
Non-Intrusive Speech Intelligibility Prediction for Hearing Aids using
  Whisper and Metadata
Non-Intrusive Speech Intelligibility Prediction for Hearing Aids using Whisper and Metadata
Ryandhimas E. Zezario
Fei Chen
C. Fuh
H. Wang
Yu Tsao
32
1
0
18 Sep 2023
Training dynamic models using early exits for automatic speech
  recognition on resource-constrained devices
Training dynamic models using early exits for automatic speech recognition on resource-constrained devices
George August Wright
Umberto Cappellazzo
Salah Zaiem
Desh Raj
Lucas Ondel Yang
Daniele Falavigna
Mohamed Nabih Ali
A. Brutti
15
2
0
18 Sep 2023
Are Soft Prompts Good Zero-shot Learners for Speech Recognition?
Are Soft Prompts Good Zero-shot Learners for Speech Recognition?
Dianwen Ng
Chong Zhang
Ruixi Zhang
Yukun Ma
Fabian Ritter Gutierrez
Trung Hieu Nguyen
Chongjia Ni
Shengkui Zhao
E. Chng
B. Ma
VLM
27
1
0
18 Sep 2023
Previous
123...121314...192021
Next