ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2110.13900
  4. Cited By
WavLM: Large-Scale Self-Supervised Pre-Training for Full Stack Speech
  Processing

WavLM: Large-Scale Self-Supervised Pre-Training for Full Stack Speech Processing

26 October 2021
Sanyuan Chen
Chengyi Wang
Zhengyang Chen
Yu-Huan Wu
Shujie Liu
Zhuo Chen
Jinyu Li
Naoyuki Kanda
Takuya Yoshioka
Xiong Xiao
Jian Wu
Long Zhou
Shuo Ren
Y. Qian
Yao Qian
Jian Wu
Micheal Zeng
Xiangzhan Yu
Furu Wei
    SSL
ArXivPDFHTML

Papers citing "WavLM: Large-Scale Self-Supervised Pre-Training for Full Stack Speech Processing"

50 / 1,021 papers shown
Title
Unifying Robustness and Fidelity: A Comprehensive Study of Pretrained
  Generative Methods for Speech Enhancement in Adverse Conditions
Unifying Robustness and Fidelity: A Comprehensive Study of Pretrained Generative Methods for Speech Enhancement in Adverse Conditions
Heming Wang
Meng Yu
H. M. Zhang
Chunlei Zhang
Zhongweiyang Xu
Muqiao Yang
Yixuan Zhang
Dong Yu
21
3
0
16 Sep 2023
Foundation Model Assisted Automatic Speech Emotion Recognition:
  Transcribing, Annotating, and Augmenting
Foundation Model Assisted Automatic Speech Emotion Recognition: Transcribing, Annotating, and Augmenting
Tiantian Feng
Shrikanth Narayanan
21
16
0
15 Sep 2023
Characterizing the temporal dynamics of universal speech representations
  for generalizable deepfake detection
Characterizing the temporal dynamics of universal speech representations for generalizable deepfake detection
Yilun Zhu
S. Powar
Tiago H. Falk
17
2
0
15 Sep 2023
Combining TF-GridNet and Mixture Encoder for Continuous Speech Separation for Meeting Transcription
Combining TF-GridNet and Mixture Encoder for Continuous Speech Separation for Meeting Transcription
Peter Vieting
Simon Berger
Thilo von Neumann
Christoph Boeddeker
Ralf Schluter
Reinhold Haeb-Umbach
19
0
0
15 Sep 2023
Echotune: A Modular Extractor Leveraging the Variable-Length Nature of
  Speech in ASR Tasks
Echotune: A Modular Extractor Leveraging the Variable-Length Nature of Speech in ASR Tasks
Sizhou Chen
Songyang Gao
Sen Fang
8
0
0
14 Sep 2023
Towards Universal Speech Discrete Tokens: A Case Study for ASR and TTS
Towards Universal Speech Discrete Tokens: A Case Study for ASR and TTS
Yifan Yang
Feiyu Shen
Chenpeng Du
Ziyang Ma
K. Yu
Daniel Povey
Xie Chen
22
24
0
14 Sep 2023
UnifiedGesture: A Unified Gesture Synthesis Model for Multiple Skeletons
UnifiedGesture: A Unified Gesture Synthesis Model for Multiple Skeletons
Sicheng Yang
Z. Wang
Zhiyong Wu
Minglei Li
Zhensong Zhang
...
Lei Hao
Songcen Xu
Xiaofei Wu
Changpeng Yang
Zonghong Dai
DiffM
39
14
0
13 Sep 2023
Attention-based Encoder-Decoder End-to-End Neural Diarization with
  Embedding Enhancer
Attention-based Encoder-Decoder End-to-End Neural Diarization with Embedding Enhancer
Zhengyang Chen
Bing Han
Shuai Wang
Yan-min Qian
16
18
0
13 Sep 2023
LeBenchmark 2.0: a Standardized, Replicable and Enhanced Framework for
  Self-supervised Representations of French Speech
LeBenchmark 2.0: a Standardized, Replicable and Enhanced Framework for Self-supervised Representations of French Speech
Titouan Parcollet
H. Nguyen
Solène Evain
Marcely Zanon Boito
Adrien Pupier
...
François Portet
Solange Rossato
F. Ringeval
D. Schwab
Laurent Besacier
32
15
0
11 Sep 2023
Towards generalisable and calibrated synthetic speech detection with
  self-supervised representations
Towards generalisable and calibrated synthetic speech detection with self-supervised representations
Octavian Pascu
Adriana Stan
Dan Oneaţă
Elisabeta Oneata
H. Cucu
SSL
23
5
0
11 Sep 2023
Hierarchical Audio-Visual Information Fusion with Multi-label Joint
  Decoding for MER 2023
Hierarchical Audio-Visual Information Fusion with Multi-label Joint Decoding for MER 2023
Haotian Wang
Yuxuan Xi
Hang Chen
Jun Du
Yan Song
...
Pengfei Hu
Ya Jiang
Shi Cheng
Jie M. Zhang
Yuzhe Weng
45
4
0
11 Sep 2023
Understanding Self-Supervised Learning of Speech Representation via
  Invariance and Redundancy Reduction
Understanding Self-Supervised Learning of Speech Representation via Invariance and Redundancy Reduction
Yusuf Brima
U. Krumnack
Simone Pika
Gunther Heidemann
SSL
19
0
0
07 Sep 2023
Stylebook: Content-Dependent Speaking Style Modeling for Any-to-Any
  Voice Conversion using Only Speech Data
Stylebook: Content-Dependent Speaking Style Modeling for Any-to-Any Voice Conversion using Only Speech Data
Hyungseob Lim
Kyungguen Byun
Sunkuk Moon
Erik Visser
DiffM
24
2
0
06 Sep 2023
PromptTTS 2: Describing and Generating Voices with Text Prompt
PromptTTS 2: Describing and Generating Voices with Text Prompt
Yichong Leng
Zhifang Guo
Kai Shen
Xu Tan
Zeqian Ju
...
Lei He
Xiang-Yang Li
Sheng Zhao
Tao Qin
Jiang Bian
VLM
DiffM
31
40
0
05 Sep 2023
Bring the Noise: Introducing Noise Robustness to Pretrained Automatic
  Speech Recognition
Bring the Noise: Introducing Noise Robustness to Pretrained Automatic Speech Recognition
Patrick Eickhoff
M. Möller
Theresa Pekarek-Rosin
Johannes Twiefel
Stefan Wermter
12
2
0
05 Sep 2023
Leveraging Label Information for Multimodal Emotion Recognition
Leveraging Label Information for Multimodal Emotion Recognition
Pei-Hsin Wang
Sunlu Zeng
Junqing Chen
Lu Fan
Meng Chen
Youzheng Wu
Xiaodong He
25
4
0
05 Sep 2023
QS-TTS: Towards Semi-Supervised Text-to-Speech Synthesis via
  Vector-Quantized Self-Supervised Speech Representation Learning
QS-TTS: Towards Semi-Supervised Text-to-Speech Synthesis via Vector-Quantized Self-Supervised Speech Representation Learning
Haohan Guo
Fenglong Xie
Jiawen Kang
Yujia Xiao
Xixin Wu
Helen M. Meng
30
3
0
31 Aug 2023
SpeechTokenizer: Unified Speech Tokenizer for Speech Large Language
  Models
SpeechTokenizer: Unified Speech Tokenizer for Speech Large Language Models
Xin Zhang
Dong Zhang
Shimin Li
Yaqian Zhou
Xipeng Qiu
25
61
0
31 Aug 2023
Unsupervised Active Learning: Optimizing Labeling Cost-Effectiveness for
  Automatic Speech Recognition
Unsupervised Active Learning: Optimizing Labeling Cost-Effectiveness for Automatic Speech Recognition
Zhisheng Zheng
Ziyang Ma
Yu Wang
Xie Chen
26
2
0
28 Aug 2023
The USTC-NERCSLIP Systems for the CHiME-7 DASR Challenge
The USTC-NERCSLIP Systems for the CHiME-7 DASR Challenge
Ruoyu Wang
Maokui He
Jun Du
Hengshun Zhou
Shutong Niu
...
Mengzhi Wang
Genshun Wan
Jia Pan
Jianqing Gao
Chin-Hui Lee
17
12
0
28 Aug 2023
Rep2wav: Noise Robust text-to-speech Using self-supervised
  representations
Rep2wav: Noise Robust text-to-speech Using self-supervised representations
Qiu-shi Zhu
Yunting Gu
Rilin Chen
Chao Weng
Yuchen Hu
Lirong Dai
Jie M. Zhang
AI4TS
40
3
0
28 Aug 2023
Speech Self-Supervised Representations Benchmarking: a Case for Larger
  Probing Heads
Speech Self-Supervised Representations Benchmarking: a Case for Larger Probing Heads
Salah Zaiem
Youcef Kemiche
Titouan Parcollet
S. Essid
Mirco Ravanelli
SSL
20
11
0
28 Aug 2023
The DiffuseStyleGesture+ entry to the GENEA Challenge 2023
The DiffuseStyleGesture+ entry to the GENEA Challenge 2023
Sicheng Yang
Haiwei Xue
Zhensong Zhang
Minglei Li
Zhiyong Wu
Xiaofei Wu
Songcen Xu
Zonghong Dai
DiffM
30
15
0
26 Aug 2023
Attention-Based Acoustic Feature Fusion Network for Depression Detection
Attention-Based Acoustic Feature Fusion Network for Depression Detection
Xiao Xu
Yang Wang
Xinru Wei
Fei Wang
Xizhe Zhang
6
5
0
24 Aug 2023
An Effective Transformer-based Contextual Model and Temporal Gate
  Pooling for Speaker Identification
An Effective Transformer-based Contextual Model and Temporal Gate Pooling for Speaker Identification
Harunori Kawano
Sota Shimizu
17
1
0
22 Aug 2023
LibriSQA: A Novel Dataset and Framework for Spoken Question Answering
  with Large Language Models
LibriSQA: A Novel Dataset and Framework for Spoken Question Answering with Large Language Models
Zihan Zhao
Yiyang Jiang
Heyang Liu
Yanfeng Wang
Yu Wang
21
1
0
20 Aug 2023
The DKU-DUKEECE System for the Manipulation Region Location Task of ADD
  2023
The DKU-DUKEECE System for the Manipulation Region Location Task of ADD 2023
Zexin Cai
Weiqing Wang
Yikang Wang
Ming Li
17
6
0
20 Aug 2023
Multi-Task Pseudo-Label Learning for Non-Intrusive Speech Quality
  Assessment Model
Multi-Task Pseudo-Label Learning for Non-Intrusive Speech Quality Assessment Model
Ryandhimas E. Zezario
B. Bai
C. Fuh
Hsin-Min Wang
Yu Tsao
11
3
0
18 Aug 2023
Integrating Emotion Recognition with Speech Recognition and Speaker
  Diarisation for Conversations
Integrating Emotion Recognition with Speech Recognition and Speaker Diarisation for Conversations
Wen Wu
C. Zhang
P. Woodland
19
3
0
14 Aug 2023
Phoneme Hallucinator: One-shot Voice Conversion via Set Expansion
Phoneme Hallucinator: One-shot Voice Conversion via Set Expansion
Siyuan Shan
Yang Li
A. Banerjee
Junier B. Oliva
18
4
0
11 Aug 2023
Audio is all in one: speech-driven gesture synthetics using WavLM pre-trained model
Fan Zhang
Naye Ji
Fuxing Gao
Siyuan Zhao
Zhaohan Wang
Shunman Li
24
0
0
11 Aug 2023
End-to-End Evaluation for Low-Latency Simultaneous Speech Translation
End-to-End Evaluation for Low-Latency Simultaneous Speech Translation
Christian Huber
Tu Anh Dinh
Carlos Mullov
Ngoc-Quan Pham
Thai-Binh Nguyen
...
Danni Liu
Zhaolin Li
Sai Koneru
J. Niehues
A. Waibel
10
3
0
07 Aug 2023
Elucidate Gender Fairness in Singing Voice Transcription
Elucidate Gender Fairness in Singing Voice Transcription
Xiangming Gu
Weizhen Zeng
Ye Wang
10
3
0
05 Aug 2023
Federated Representation Learning for Automatic Speech Recognition
Federated Representation Learning for Automatic Speech Recognition
Guruprasad V Ramesh
Gopinath Chennupati
Milind Rao
Anit Kumar Sahu
Ariya Rastrow
J. Droppo
18
0
0
03 Aug 2023
Many-to-Many Spoken Language Translation via Unified Speech and Text
  Representation Learning with Unit-to-Unit Translation
Many-to-Many Spoken Language Translation via Unified Speech and Text Representation Learning with Unit-to-Unit Translation
Minsu Kim
J. Choi
Dahun Kim
Y. Ro
33
10
0
03 Aug 2023
SALTTS: Leveraging Self-Supervised Speech Representations for improved
  Text-to-Speech Synthesis
SALTTS: Leveraging Self-Supervised Speech Representations for improved Text-to-Speech Synthesis
Ramanan Sivaguru
Vasista Sai Lodagala
S. Umesh
8
2
0
02 Aug 2023
Mispronunciation detection using self-supervised speech representations
Mispronunciation detection using self-supervised speech representations
Jazmín Vidal
Pablo Riera
Luciana Ferrer
6
1
0
30 Jul 2023
UniBriVL: Robust Universal Representation and Generation of Audio Driven
  Diffusion Models
UniBriVL: Robust Universal Representation and Generation of Audio Driven Diffusion Models
Sen Fang
Bowen Gao
Yangjian Wu
T. Teoh
DiffM
18
1
0
29 Jul 2023
The Effect of Spoken Language on Speech Enhancement using
  Self-Supervised Speech Representation Loss Functions
The Effect of Spoken Language on Speech Enhancement using Self-Supervised Speech Representation Loss Functions
George Close
Thomas Hain
Stefan Goetze
21
8
0
27 Jul 2023
Joint speech and overlap detection: a benchmark over multiple audio
  setup and speech domains
Joint speech and overlap detection: a benchmark over multiple audio setup and speech domains
Martin Lebourdais
Théo Mariotte
Marie Tahon
Anthony Larcher
Antoine Laurent
Silvio Montrésor
S. Meignier
Jean-Hugh Thomas
VLM
25
5
0
24 Jul 2023
Exploring the Integration of Speech Separation and Recognition with
  Self-Supervised Learning Representation
Exploring the Integration of Speech Separation and Recognition with Self-Supervised Learning Representation
Yoshiki Masuyama
Xuankai Chang
Wangyou Zhang
Samuele Cornell
Zhongqiu Wang
Nobutaka Ono
Y. Qian
Shinji Watanabe
23
6
0
23 Jul 2023
Vesper: A Compact and Effective Pretrained Model for Speech Emotion
  Recognition
Vesper: A Compact and Effective Pretrained Model for Speech Emotion Recognition
Weidong Chen
Xiaofen Xing
Peihao Chen
Xiangmin Xu
VLM
23
34
0
20 Jul 2023
Improving Domain Generalization for Sound Classification with Sparse
  Frequency-Regularized Transformer
Improving Domain Generalization for Sound Classification with Sparse Frequency-Regularized Transformer
Honglin Mu
Wentian Xia
Wanxiang Che
8
1
0
19 Jul 2023
SLMGAN: Exploiting Speech Language Model Representations for
  Unsupervised Zero-Shot Voice Conversion in GANs
SLMGAN: Exploiting Speech Language Model Representations for Unsupervised Zero-Shot Voice Conversion in GANs
Yinghao Aaron Li
Cong Han
N. Mesgarani
18
5
0
18 Jul 2023
Mega-TTS 2: Boosting Prompting Mechanisms for Zero-Shot Speech Synthesis
Mega-TTS 2: Boosting Prompting Mechanisms for Zero-Shot Speech Synthesis
Ziyue Jiang
Jinglin Liu
Yi Ren
Jinzheng He
Zhe Ye
...
Pengfei Wei
Chunfeng Wang
Xiang Yin
Zejun Ma
Zhou Zhao
22
41
0
14 Jul 2023
On the Use of Self-Supervised Speech Representations in Spontaneous
  Speech Synthesis
On the Use of Self-Supervised Speech Representations in Spontaneous Speech Synthesis
Siyang Wang
G. Henter
Joakim Gustafson
Éva Székely
34
5
0
11 Jul 2023
Large AI Model-Based Semantic Communications
Large AI Model-Based Semantic Communications
Feibo Jiang
Yubo Peng
Li Dong
Kezhi Wang
Kun Yang
Cunhua Pan
Xiaohu You
25
47
0
07 Jul 2023
On-Device Constrained Self-Supervised Speech Representation Learning for
  Keyword Spotting via Knowledge Distillation
On-Device Constrained Self-Supervised Speech Representation Learning for Keyword Spotting via Knowledge Distillation
Gene-Ping Yang
Yue Gu
Qingming Tang
Dongsu Du
Yuzong Liu
8
5
0
06 Jul 2023
Exploring Multimodal Approaches for Alzheimer's Disease Detection Using
  Patient Speech Transcript and Audio Data
Exploring Multimodal Approaches for Alzheimer's Disease Detection Using Patient Speech Transcript and Audio Data
Hongmin Cai
Xiaoke Huang
Zheng Liu
Wenxiong Liao
Haixing Dai
...
Dajiang Zhu
Hui Ren
Quanzheng Li
Tianming Liu
Xiang Li
20
17
0
05 Jul 2023
Knowledge-Aware Audio-Grounded Generative Slot Filling for Limited
  Annotated Data
Knowledge-Aware Audio-Grounded Generative Slot Filling for Limited Annotated Data
Guangzhi Sun
C. Zhang
Ivan Vulić
Paweł Budzianowski
P. Woodland
18
6
0
04 Jul 2023
Previous
123...131415...192021
Next