ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2110.13900
  4. Cited By
WavLM: Large-Scale Self-Supervised Pre-Training for Full Stack Speech
  Processing

WavLM: Large-Scale Self-Supervised Pre-Training for Full Stack Speech Processing

26 October 2021
Sanyuan Chen
Chengyi Wang
Zhengyang Chen
Yu-Huan Wu
Shujie Liu
Zhuo Chen
Jinyu Li
Naoyuki Kanda
Takuya Yoshioka
Xiong Xiao
Jian Wu
Long Zhou
Shuo Ren
Y. Qian
Yao Qian
Jian Wu
Micheal Zeng
Xiangzhan Yu
Furu Wei
    SSL
ArXivPDFHTML

Papers citing "WavLM: Large-Scale Self-Supervised Pre-Training for Full Stack Speech Processing"

50 / 1,021 papers shown
Title
Pretraining Conformer with ASR or ASV for Anti-Spoofing Countermeasure
Pretraining Conformer with ASR or ASV for Anti-Spoofing Countermeasure
Yikang Wang
Hiromitsu Nishizaki
Ming Li
18
0
0
04 Jul 2023
Semantic enrichment towards efficient speech representations
Semantic enrichment towards efficient speech representations
G. Laperriere
H. Nguyen
Sahar Ghannay
Bassam Jabaian
Yannick Esteve
38
2
0
03 Jul 2023
Don't Stop Self-Supervision: Accent Adaptation of Speech Representations
  via Residual Adapters
Don't Stop Self-Supervision: Accent Adaptation of Speech Representations via Residual Adapters
Anshu Bhatia
Sanchit Sinha
Saket Dingliwal
Karthik Gopalakrishnan
S. Bodapati
Katrin Kirchhoff
17
6
0
02 Jul 2023
VoxWatch: An open-set speaker recognition benchmark on VoxCeleb
VoxWatch: An open-set speaker recognition benchmark on VoxCeleb
Raghuveer Peri
S. O. Sadjadi
D. Garcia-Romero
14
3
0
30 Jun 2023
What Do Self-Supervised Speech Models Know About Words?
What Do Self-Supervised Speech Models Know About Words?
Ankita Pasad
C. Chien
Shane Settle
Karen Livescu
SSL
16
26
0
30 Jun 2023
Beyond Neural-on-Neural Approaches to Speaker Gender Protection
Beyond Neural-on-Neural Approaches to Speaker Gender Protection
L. V. Bemmel
Zhuoran Liu
Nik Vaessen
Martha Larson
AAML
13
2
0
30 Jun 2023
LyricWhiz: Robust Multilingual Zero-shot Lyrics Transcription by
  Whispering to ChatGPT
LyricWhiz: Robust Multilingual Zero-shot Lyrics Transcription by Whispering to ChatGPT
Le Zhuo
Ruibin Yuan
Jiahao Pan
Yi Ma
Yizhi Li
...
Chenghua Lin
Emmanouil Benetos
Wenhu Chen
Wei Xue
Yi-Ting Guo
20
15
0
29 Jun 2023
Enrollment-stage Backdoor Attacks on Speaker Recognition Systems via
  Adversarial Ultrasound
Enrollment-stage Backdoor Attacks on Speaker Recognition Systems via Adversarial Ultrasound
Xinfeng Li
Junning Ze
Chen Yan
Yushi Cheng
Xiaoyu Ji
Wenyuan Xu
AAML
23
11
0
28 Jun 2023
3D-Speaker: A Large-Scale Multi-Device, Multi-Distance, and
  Multi-Dialect Corpus for Speech Representation Disentanglement
3D-Speaker: A Large-Scale Multi-Device, Multi-Distance, and Multi-Dialect Corpus for Speech Representation Disentanglement
Siqi Zheng
Luyao Cheng
Yafeng Chen
Haibo Wang
Qian Chen
8
15
0
27 Jun 2023
Wespeaker baselines for VoxSRC2023
Wespeaker baselines for VoxSRC2023
Shuai Wang
Che-Yuan Liang
Xu Xiang
Bing Han
Zhengyang Chen
Hongji Wang
Wen Ding
14
0
0
27 Jun 2023
The Singing Voice Conversion Challenge 2023
The Singing Voice Conversion Challenge 2023
Wen-Chin Huang
Lester Phillip Violeta
Songxiang Liu
Jiatong Shi
T. Toda
8
45
0
26 Jun 2023
The CHiME-7 DASR Challenge: Distant Meeting Transcription with Multiple
  Devices in Diverse Scenarios
The CHiME-7 DASR Challenge: Distant Meeting Transcription with Multiple Devices in Diverse Scenarios
Samuele Cornell
Matthew Wiesner
Shinji Watanabe
Desh Raj
Xuankai Chang
...
Matthew Maciejewski
Yoshiki Masuyama
Zhong-Qiu Wang
S. Squartini
Sanjeev Khudanpur
11
51
0
23 Jun 2023
Voicebox: Text-Guided Multilingual Universal Speech Generation at Scale
Voicebox: Text-Guided Multilingual Universal Speech Generation at Scale
Matt Le
Apoorv Vyas
Bowen Shi
Brian Karrer
Leda Sari
...
Mary Williamson
Vimal Manohar
Yossi Adi
Jay Mahadeokar
Wei-Ning Hsu
AuLLM
17
263
0
23 Jun 2023
Speech Emotion Diarization: Which Emotion Appears When?
Speech Emotion Diarization: Which Emotion Appears When?
Yingzhi Wang
Mirco Ravanelli
Alya Yacoubi
10
11
0
22 Jun 2023
AudioPaLM: A Large Language Model That Can Speak and Listen
AudioPaLM: A Large Language Model That Can Speak and Listen
Paul Kishan Rubenstein
Chulayuth Asawaroengchai
D. Nguyen
Ankur Bapna
Zalan Borsos
...
Neil Zeghidour
Yu Zhang
Zhishuai Zhang
Lukás Zilka
Christian Frank
LM&MA
AuLLM
VLM
35
256
0
22 Jun 2023
Toward Leveraging Pre-Trained Self-Supervised Frontends for Automatic
  Singing Voice Understanding Tasks: Three Case Studies
Toward Leveraging Pre-Trained Self-Supervised Frontends for Automatic Singing Voice Understanding Tasks: Three Case Studies
Yuya Yamamoto
12
2
0
22 Jun 2023
Federated Self-Learning with Weak Supervision for Speech Recognition
Federated Self-Learning with Weak Supervision for Speech Recognition
Milind Rao
Gopinath Chennupati
Gautam Tiwari
Anit Kumar Sahu
A. Raju
Ariya Rastrow
J. Droppo
13
3
0
21 Jun 2023
Evaluation of Speech Representations for MOS prediction
Evaluation of Speech Representations for MOS prediction
F. S. Oliveira
Edresson Casanova
Arnaldo Cândido Júnior
L. Gris
A. S. Soares
A. R. G. Filho
19
4
0
16 Jun 2023
Multi-Loss Convolutional Network with Time-Frequency Attention for
  Speech Enhancement
Multi-Loss Convolutional Network with Time-Frequency Attention for Speech Enhancement
Liang Wan
Hongqing Liu
Yi Zhou
Jie Ji
17
2
0
15 Jun 2023
Pushing the Limits of Unsupervised Unit Discovery for SSL Speech
  Representation
Pushing the Limits of Unsupervised Unit Discovery for SSL Speech Representation
Ziyang Ma
Zhisheng Zheng
Guanrou Yang
Yu Wang
C. Zhang
Xie Chen
SSL
14
8
0
15 Jun 2023
Feature Normalization for Fine-tuning Self-Supervised Models in Speech
  Enhancement
Feature Normalization for Fine-tuning Self-Supervised Models in Speech Enhancement
Hejung Yang
Hong-Goo Kang
SSL
15
0
0
14 Jun 2023
SpeechGLUE: How Well Can Self-Supervised Speech Models Capture
  Linguistic Knowledge?
SpeechGLUE: How Well Can Self-Supervised Speech Models Capture Linguistic Knowledge?
Takanori Ashihara
Takafumi Moriya
Kohei Matsuura
Tomohiro Tanaka
Yusuke Ijima
Taichi Asami
Marc Delcroix
Yukinori Honma
SSL
ELM
19
11
0
14 Jun 2023
GEmo-CLAP: Gender-Attribute-Enhanced Contrastive Language-Audio
  Pretraining for Accurate Speech Emotion Recognition
GEmo-CLAP: Gender-Attribute-Enhanced Contrastive Language-Audio Pretraining for Accurate Speech Emotion Recognition
Y. Pan
Yanni Hu
Yuguang Yang
Wen Fei
Jixun Yao
Heng Lu
Lei Ma
Jianjun Zhao
VLM
54
8
0
13 Jun 2023
Unlocking Foundation Models for Privacy-Enhancing Speech Understanding:
  An Early Study on Low Resource Speech Training Leveraging Label-guided
  Synthetic Speech Content
Unlocking Foundation Models for Privacy-Enhancing Speech Understanding: An Early Study on Low Resource Speech Training Leveraging Label-guided Synthetic Speech Content
Tiantian Feng
Digbalay Bose
Xuan Shi
Shrikanth Narayanan
18
1
0
13 Jun 2023
StyleTTS 2: Towards Human-Level Text-to-Speech through Style Diffusion
  and Adversarial Training with Large Speech Language Models
StyleTTS 2: Towards Human-Level Text-to-Speech through Style Diffusion and Adversarial Training with Large Speech Language Models
Yinghao Aaron Li
Cong Han
Vinay S. Raghavan
Gavin Mischler
N. Mesgarani
VLM
DiffM
21
107
0
13 Jun 2023
A Comprehensive Survey on Applications of Transformers for Deep Learning
  Tasks
A Comprehensive Survey on Applications of Transformers for Deep Learning Tasks
Saidul Islam
Hanae Elmekki
Ahmed Elsebai
Jamal Bentahar
Najat Drawel
Gaith Rjoub
Witold Pedrycz
ViT
MedIm
13
167
0
11 Jun 2023
Estimating the Uncertainty in Emotion Attributes using Deep Evidential
  Regression
Estimating the Uncertainty in Emotion Attributes using Deep Evidential Regression
Wen Wu
C. Zhang
P. Woodland
UQCV
UD
EDL
12
11
0
11 Jun 2023
Reducing Barriers to Self-Supervised Learning: HuBERT Pre-training with
  Academic Compute
Reducing Barriers to Self-Supervised Learning: HuBERT Pre-training with Academic Compute
William Chen
Xuankai Chang
Yifan Peng
Zhaoheng Ni
Soumi Maiti
Shinji Watanabe
SSL
12
25
0
11 Jun 2023
Mandarin Electrolaryngeal Speech Voice Conversion using Cross-domain
  Features
Mandarin Electrolaryngeal Speech Voice Conversion using Cross-domain Features
Hsin-Hao Chen
Yung-Lun Chien
Ming-Chi Yen
S. Tsai
Yu Tsao
T. Chi
Hsin-Min Wang
9
2
0
11 Jun 2023
PEFT-SER: On the Use of Parameter Efficient Transfer Learning Approaches
  For Speech Emotion Recognition Using Pre-trained Speech Models
PEFT-SER: On the Use of Parameter Efficient Transfer Learning Approaches For Speech Emotion Recognition Using Pre-trained Speech Models
Tiantian Feng
Shrikanth Narayanan
14
24
0
08 Jun 2023
KIT's Multilingual Speech Translation System for IWSLT 2023
KIT's Multilingual Speech Translation System for IWSLT 2023
Danni Liu
Thai-Binh Nguyen
Sai Koneru
Enes Yavuz Ugan
Ngoc-Quan Pham
Tuan-Nam Nguyen
Tu Anh Dinh
Carlos Mullov
A. Waibel
J. Niehues
13
6
0
08 Jun 2023
RescueSpeech: A German Corpus for Speech Recognition in Search and
  Rescue Domain
RescueSpeech: A German Corpus for Speech Recognition in Search and Rescue Domain
Sangeet Sagar
Mirco Ravanelli
B. Kiefer
Ivana Kruijff Korbayova
Josef van Genabith
11
1
0
06 Jun 2023
Mega-TTS: Zero-Shot Text-to-Speech at Scale with Intrinsic Inductive
  Bias
Mega-TTS: Zero-Shot Text-to-Speech at Scale with Intrinsic Inductive Bias
Ziyue Jiang
Yi Ren
Zhe Ye
Jinglin Liu
Chen Zhang
...
Rongjie Huang
Chunfeng Wang
Xiang Yin
Zejun Ma
Zhou Zhao
DiffM
21
72
0
06 Jun 2023
Ada-TTA: Towards Adaptive High-Quality Text-to-Talking Avatar Synthesis
Ada-TTA: Towards Adaptive High-Quality Text-to-Talking Avatar Synthesis
Zhe Ye
Ziyue Jiang
Yi Ren
Jinglin Liu
Chen Zhang
Xiang Yin
Zejun Ma
Zhou Zhao
27
4
0
06 Jun 2023
In-the-wild Speech Emotion Conversion Using Disentangled Self-Supervised
  Representations and Neural Vocoder-based Resynthesis
In-the-wild Speech Emotion Conversion Using Disentangled Self-Supervised Representations and Neural Vocoder-based Resynthesis
N. Prabhu
N. Lehmann-Willenbrock
Timo Gerkmann
14
3
0
02 Jun 2023
Task-Agnostic Structured Pruning of Speech Representation Models
Task-Agnostic Structured Pruning of Speech Representation Models
Haoyu Wang
Siyuan Wang
Weiqiang Zhang
Hongbin Suo
Yulong Wan
VLM
14
14
0
02 Jun 2023
DistilXLSR: A Light Weight Cross-Lingual Speech Representation Model
DistilXLSR: A Light Weight Cross-Lingual Speech Representation Model
Haoyu Wang
Siyuan Wang
Weiqiang Zhang
Jinfeng Bai
19
2
0
02 Jun 2023
How Generative Spoken Language Modeling Encodes Noisy Speech:
  Investigation from Phonetics to Syntactics
How Generative Spoken Language Modeling Encodes Noisy Speech: Investigation from Phonetics to Syntactics
Joonyong Park
Shinnosuke Takamichi
Tomohiko Nakamura
Kentaro Seki
Detai Xin
Hiroshi Saruwatari
AuLLM
9
3
0
01 Jun 2023
Some voices are too common: Building fair speech recognition systems
  using the Common Voice dataset
Some voices are too common: Building fair speech recognition systems using the Common Voice dataset
Lucas Maison
Yannick Esteve
16
3
0
01 Jun 2023
Masked Autoencoders with Multi-Window Local-Global Attention Are Better
  Audio Learners
Masked Autoencoders with Multi-Window Local-Global Attention Are Better Audio Learners
Sarthak Yadav
Sergios Theodoridis
Lars Kai Hansen
Z. Tan
15
7
0
01 Jun 2023
Automatic Data Augmentation for Domain Adapted Fine-Tuning of
  Self-Supervised Speech Representations
Automatic Data Augmentation for Domain Adapted Fine-Tuning of Self-Supervised Speech Representations
Salah Zaiem
Titouan Parcollet
S. Essid
24
2
0
01 Jun 2023
Speech Self-Supervised Representation Benchmarking: Are We Doing it
  Right?
Speech Self-Supervised Representation Benchmarking: Are We Doing it Right?
Salah Zaiem
Youcef Kemiche
Titouan Parcollet
S. Essid
Mirco Ravanelli
SSL
4
23
0
01 Jun 2023
AfriNames: Most ASR models "butcher" African Names
AfriNames: Most ASR models "butcher" African Names
Tobi Olatunji
Tejumade Afonja
Bonaventure F. P. Dossou
A. Tonja
Chris C. Emezue
Amina Mardiyyah Rufai
Sahib Singh
19
5
0
01 Jun 2023
MERT: Acoustic Music Understanding Model with Large-Scale
  Self-supervised Training
MERT: Acoustic Music Understanding Model with Large-Scale Self-supervised Training
Yizhi Li
Ruibin Yuan
Ge Zhang
Yi Ma
Xingran Chen
...
Yemin Shi
Wen-Fen Huang
Zili Wang
Yi-Ting Guo
Jie Fu
20
104
0
31 May 2023
MiniSUPERB: Lightweight Benchmark for Self-supervised Speech Models
MiniSUPERB: Lightweight Benchmark for Self-supervised Speech Models
Yu-Hsiang Wang
Huan Chen
Kai-Wei Chang
Winston H. Hsu
Hung-yi Lee
10
6
0
30 May 2023
Voice Conversion With Just Nearest Neighbors
Voice Conversion With Just Nearest Neighbors
Matthew Baas
Benjamin van Niekerk
Herman Kamper
SSL
30
48
0
30 May 2023
Adapting Multi-Lingual ASR Models for Handling Multiple Talkers
Adapting Multi-Lingual ASR Models for Handling Multiple Talkers
Chenda Li
Yao Qian
Zhuo Chen
Naoyuki Kanda
Dongmei Wang
Takuya Yoshioka
Y. Qian
Michael Zeng
21
10
0
30 May 2023
A Hierarchical Context-aware Modeling Approach for Multi-aspect and
  Multi-granular Pronunciation Assessment
A Hierarchical Context-aware Modeling Approach for Multi-aspect and Multi-granular Pronunciation Assessment
Fu-An Chao
Tien-Hong Lo
Tzu-I Wu
Yao-Ting Sung
Berlin Chen
18
7
0
29 May 2023
Exploration of Efficient End-to-End ASR using Discretized Input from
  Self-Supervised Learning
Exploration of Efficient End-to-End ASR using Discretized Input from Self-Supervised Learning
Xuankai Chang
Brian Yan
Yuya Fujita
Takashi Maekaku
Shinji Watanabe
14
37
0
29 May 2023
An Experimental Review of Speaker Diarization methods with application
  to Two-Speaker Conversational Telephone Speech recordings
An Experimental Review of Speaker Diarization methods with application to Two-Speaker Conversational Telephone Speech recordings
L. Serafini
Samuele Cornell
Giovanni Morrone
Enrico Zovato
A. Brutti
S. Squartini
27
9
0
29 May 2023
Previous
123...141516...192021
Next