ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2110.13900
  4. Cited By
WavLM: Large-Scale Self-Supervised Pre-Training for Full Stack Speech
  Processing

WavLM: Large-Scale Self-Supervised Pre-Training for Full Stack Speech Processing

26 October 2021
Sanyuan Chen
Chengyi Wang
Zhengyang Chen
Yu-Huan Wu
Shujie Liu
Zhuo Chen
Jinyu Li
Naoyuki Kanda
Takuya Yoshioka
Xiong Xiao
Jian Wu
Long Zhou
Shuo Ren
Y. Qian
Yao Qian
Jian Wu
Micheal Zeng
Xiangzhan Yu
Furu Wei
    SSL
ArXivPDFHTML

Papers citing "WavLM: Large-Scale Self-Supervised Pre-Training for Full Stack Speech Processing"

50 / 1,021 papers shown
Title
Emotion-Aware Speech Self-Supervised Representation Learning with
  Intensity Knowledge
Emotion-Aware Speech Self-Supervised Representation Learning with Intensity Knowledge
Rui Liu
Zening Ma
SSL
29
1
0
10 Jun 2024
Zero-Shot End-To-End Spoken Question Answering In Medical Domain
Zero-Shot End-To-End Spoken Question Answering In Medical Domain
Yanis Labrak
Adel Moumen
Richard Dufour
Mickael Rouvier
ELM
LM&MA
MedIm
29
0
0
09 Jun 2024
MaLa-ASR: Multimedia-Assisted LLM-Based ASR
MaLa-ASR: Multimedia-Assisted LLM-Based ASR
Guanrou Yang
Ziyang Ma
Fan Yu
Zhifu Gao
Shiliang Zhang
Xie Chen
AuLLM
36
2
0
09 Jun 2024
An Investigation of Noise Robustness for Flow-Matching-Based Zero-Shot
  TTS
An Investigation of Noise Robustness for Flow-Matching-Based Zero-Shot TTS
Xiaofei Wang
Sefik Emre Eskimez
Manthan Thakker
Hemin Yang
Zirun Zhu
...
Yufei Xia
Jinzhu Li
Sheng Zhao
Jinyu Li
Naoyuki Kanda
27
3
0
09 Jun 2024
SPA-SVC: Self-supervised Pitch Augmentation for Singing Voice Conversion
SPA-SVC: Self-supervised Pitch Augmentation for Singing Voice Conversion
Bingsong Bai
Fengping Wang
Yingming Gao
Ya Li
33
0
0
09 Jun 2024
MS-HuBERT: Mitigating Pre-training and Inference Mismatch in Masked Language Modelling methods for learning Speech Representations
MS-HuBERT: Mitigating Pre-training and Inference Mismatch in Masked Language Modelling methods for learning Speech Representations
Hemant Yadav
Sunayana Sitaram
R. Shah
SSL
39
1
0
09 Jun 2024
Autoregressive Diffusion Transformer for Text-to-Speech Synthesis
Autoregressive Diffusion Transformer for Text-to-Speech Synthesis
Zhijun Liu
Shuai Wang
Sho Inoue
Qibing Bai
Haizhou Li
DiffM
32
15
0
08 Jun 2024
Exploring the Benefits of Tokenization of Discrete Acoustic Units
Exploring the Benefits of Tokenization of Discrete Acoustic Units
Avihu Dekel
Raul Fernandez
30
2
0
08 Jun 2024
DAISY: Data Adaptive Self-Supervised Early Exit for Speech
  Representation Models
DAISY: Data Adaptive Self-Supervised Early Exit for Speech Representation Models
T. Lin
Hung-yi Lee
Hao Tang
22
1
0
08 Jun 2024
VALL-E 2: Neural Codec Language Models are Human Parity Zero-Shot Text
  to Speech Synthesizers
VALL-E 2: Neural Codec Language Models are Human Parity Zero-Shot Text to Speech Synthesizers
Sanyuan Chen
Shujie Liu
Long Zhou
Yanqing Liu
Xu Tan
Jinyu Li
Sheng Zhao
Yao Qian
Furu Wei
VLM
29
64
0
08 Jun 2024
To what extent can ASV systems naturally defend against spoofing
  attacks?
To what extent can ASV systems naturally defend against spoofing attacks?
Jee-weon Jung
Xin Eric Wang
Nicholas W. D. Evans
Shinji Watanabe
Hye-jin Shim
Hemlata Tak
Sidhhant Arora
Junichi Yamagishi
Joon Son Chung
AAML
30
3
0
08 Jun 2024
XANE: eXplainable Acoustic Neural Embeddings
XANE: eXplainable Acoustic Neural Embeddings
Sri Harsha Dumpala
D. Sharma
Chandramouli Shama Sastri
S. Kruchinin
James Fosburgh
Patrick A. Naylor
16
2
0
07 Jun 2024
On the social bias of speech self-supervised models
On the social bias of speech self-supervised models
Yi-Cheng Lin
T. Lin
Hsi-Che Lin
Andy T. Liu
Hung-yi Lee
32
3
0
07 Jun 2024
XTTS: a Massively Multilingual Zero-Shot Text-to-Speech Model
XTTS: a Massively Multilingual Zero-Shot Text-to-Speech Model
Edresson Casanova
Kelly Davis
Eren Golge
Görkem Göknar
Iulian Gulea
...
Aya Aljafari
Joshua Meyer
Reuben Morais
Samuel Olayemi
Julian Weber
VLM
32
65
0
07 Jun 2024
LipGER: Visually-Conditioned Generative Error Correction for Robust
  Automatic Speech Recognition
LipGER: Visually-Conditioned Generative Error Correction for Robust Automatic Speech Recognition
Sreyan Ghosh
Sonal Kumar
Ashish Seth
Purva Chiniya
Utkarsh Tyagi
R. Duraiswami
Dinesh Manocha
41
0
0
06 Jun 2024
BLSP-Emo: Towards Empathetic Large Speech-Language Models
BLSP-Emo: Towards Empathetic Large Speech-Language Models
Chen Wang
Minpeng Liao
Zhongqiang Huang
Junhong Wu
Chengqing Zong
Jiajun Zhang
VLM
AuLLM
38
4
0
06 Jun 2024
Genuine-Focused Learning using Mask AutoEncoder for Generalized Fake
  Audio Detection
Genuine-Focused Learning using Mask AutoEncoder for Generalized Fake Audio Detection
Xiaopeng Wang
Ruibo Fu
Zhengqi Wen
Zhiyong Wang
Yuankun Xie
...
Xuefei Liu
Yongwei Li
Xin Qi
Yi Lu
Shuchen Shi
28
4
0
05 Jun 2024
Dataset-Distillation Generative Model for Speech Emotion Recognition
Dataset-Distillation Generative Model for Speech Emotion Recognition
Fabian Ritter Gutierrez
Kuan Po Huang
Jeremy H. M Wong
Dianwen Ng
Hung-yi Lee
Nancy F. Chen
Eng Siong Chng
DD
32
0
0
05 Jun 2024
ConPCO: Preserving Phoneme Characteristics for Automatic Pronunciation
  Assessment Leveraging Contrastive Ordinal Regularization
ConPCO: Preserving Phoneme Characteristics for Automatic Pronunciation Assessment Leveraging Contrastive Ordinal Regularization
Bi-Cheng Yan
Wei-Cheng Chao
Jiun-Ting Li
Yi-Cheng Wang
Hsin-Wei Wang
Meng-Shin Lin
Berlin Chen
18
0
0
05 Jun 2024
CtrSVDD: A Benchmark Dataset and Baseline Analysis for Controlled
  Singing Voice Deepfake Detection
CtrSVDD: A Benchmark Dataset and Baseline Analysis for Controlled Singing Voice Deepfake Detection
Yongyi Zang
Jiatong Shi
You Zhang
Ryuichi Yamamoto
Jionghao Han
...
Shengyuan Xu
Wenxiao Zhao
Jing Guo
T. Toda
Zhiyao Duan
26
10
0
04 Jun 2024
Seed-TTS: A Family of High-Quality Versatile Speech Generation Models
Seed-TTS: A Family of High-Quality Versatile Speech Generation Models
Philip Anastassiou
Jiawei Chen
J. Chen
Yuanzhe Chen
Zhuo Chen
...
Wenjie Zhang
Y. Zhang
Zilin Zhao
Dejian Zhong
Xiaobin Zhuang
38
74
0
04 Jun 2024
Towards Supervised Performance on Speaker Verification with
  Self-Supervised Learning by Leveraging Large-Scale ASR Models
Towards Supervised Performance on Speaker Verification with Self-Supervised Learning by Leveraging Large-Scale ASR Models
Victor Miara
Theo Lepage
Reda Dehak
24
1
0
04 Jun 2024
Audio Mamba: Selective State Spaces for Self-Supervised Audio
  Representations
Audio Mamba: Selective State Spaces for Self-Supervised Audio Representations
Sarthak Yadav
Z. Tan
Mamba
23
10
0
04 Jun 2024
Phonetic Enhanced Language Modeling for Text-to-Speech Synthesis
Phonetic Enhanced Language Modeling for Text-to-Speech Synthesis
Kun Zhou
Shengkui Zhao
Yukun Ma
Chong Zhang
Hao Wang
Dianwen Ng
Chongjia Ni
Nguyen Trung Hieu
J. Yip
Bin Ma
20
5
0
04 Jun 2024
ControlSpeech: Towards Simultaneous Zero-shot Speaker Cloning and
  Zero-shot Language Style Control With Decoupled Codec
ControlSpeech: Towards Simultaneous Zero-shot Speaker Cloning and Zero-shot Language Style Control With Decoupled Codec
Shengpeng Ji
Jia-li Zuo
Minghui Fang
Siqi Zheng
Qian Chen
...
Ziyue Jiang
Hai Huang
Xize Cheng
Rongjie Huang
Zhou Zhao
45
7
0
03 Jun 2024
Generative Pre-trained Speech Language Model with Efficient Hierarchical
  Transformer
Generative Pre-trained Speech Language Model with Efficient Hierarchical Transformer
Yongxin Zhu
Dan Su
Liqiang He
Linli Xu
Dong Yu
31
5
0
03 Jun 2024
YODAS: Youtube-Oriented Dataset for Audio and Speech
YODAS: Youtube-Oriented Dataset for Audio and Speech
Xinjian Li
Shinnosuke Takamichi
Takaaki Saeki
William Chen
Sayaka Shiota
Shinji Watanabe
38
16
0
02 Jun 2024
SeamlessExpressiveLM: Speech Language Model for Expressive
  Speech-to-Speech Translation with Chain-of-Thought
SeamlessExpressiveLM: Speech Language Model for Expressive Speech-to-Speech Translation with Chain-of-Thought
Hongyu Gong
Bandhav Veluri
38
0
0
30 May 2024
Fill in the Gap! Combining Self-supervised Representation Learning with
  Neural Audio Synthesis for Speech Inpainting
Fill in the Gap! Combining Self-supervised Representation Learning with Neural Audio Synthesis for Speech Inpainting
Ihab Asaad
Maxime Jacquelin
Olivier Perrotin
Laurent Girin
Thomas Hueber
25
0
0
30 May 2024
1st Place Solution to Odyssey Emotion Recognition Challenge Task1:
  Tackling Class Imbalance Problem
1st Place Solution to Odyssey Emotion Recognition Challenge Task1: Tackling Class Imbalance Problem
Mingjie Chen
Hezhao Zhang
Yuanchao Li
Jiachen Luo
Wen Wu
...
Lin Wang
P. Woodland
Xie Chen
Huy P Phan
Thomas Hain
18
0
0
30 May 2024
TransVIP: Speech to Speech Translation System with Voice and Isochrony
  Preservation
TransVIP: Speech to Speech Translation System with Voice and Isochrony Preservation
Chenyang Le
Yao Qian
Dongmei Wang
Long Zhou
Shujie Liu
...
Midia Yousefi
Yanmin Qian
Jinyu Li
Sheng Zhao
Michael Zeng
34
3
0
28 May 2024
CrossVoice: Crosslingual Prosody Preserving Cascade-S2ST using Transfer
  Learning
CrossVoice: Crosslingual Prosody Preserving Cascade-S2ST using Transfer Learning
Medha Hira
Arnav Goel
Anubha Gupta
20
1
0
23 May 2024
Self-Taught Recognizer: Toward Unsupervised Adaptation for Speech
  Foundation Models
Self-Taught Recognizer: Toward Unsupervised Adaptation for Speech Foundation Models
Yuchen Hu
Chen Chen
Chao-Han Huck Yang
Chengwei Qin
Pin-Yu Chen
Chng Eng Siong
Chao Zhang
VLM
33
3
0
23 May 2024
SIGGesture: Generalized Co-Speech Gesture Synthesis via Semantic
  Injection with Large-Scale Pre-Training Diffusion Models
SIGGesture: Generalized Co-Speech Gesture Synthesis via Semantic Injection with Large-Scale Pre-Training Diffusion Models
Qingrong Cheng
Xu Li
Xinghui Fu
DiffM
27
2
0
22 May 2024
A Novel Fusion Architecture for PD Detection Using Semi-Supervised
  Speech Embeddings
A Novel Fusion Architecture for PD Detection Using Semi-Supervised Speech Embeddings
Tariq Adnan
Abdelrahman Abdelkader
Zipei Liu
Ekram Hossain
Sooyong Park
Md. Saiful Islam
Ehsan Hoque
25
2
0
21 May 2024
Unsupervised Multimodal Clustering for Semantics Discovery in Multimodal
  Utterances
Unsupervised Multimodal Clustering for Semantics Discovery in Multimodal Utterances
Hanlei Zhang
Hua Xu
Fei Long
Xin Wang
Kai Gao
33
3
0
21 May 2024
Mamba in Speech: Towards an Alternative to Self-Attention
Mamba in Speech: Towards an Alternative to Self-Attention
Xiangyu Zhang
Qiquan Zhang
Hexin Liu
Tianyi Xiao
Xinyuan Qian
Beena Ahmed
E. Ambikairajah
Haizhou Li
Julien Epps
Mamba
47
36
0
21 May 2024
Neighborhood Attention Transformer with Progressive Channel Fusion for
  Speaker Verification
Neighborhood Attention Transformer with Progressive Channel Fusion for Speaker Verification
Nian Li
Jianguo Wei
ViT
22
0
0
20 May 2024
Evaluating Text-to-Speech Synthesis from a Large Discrete Token-based
  Speech Language Model
Evaluating Text-to-Speech Synthesis from a Large Discrete Token-based Speech Language Model
Siyang Wang
Éva Székely
33
4
0
16 May 2024
SpeechVerse: A Large-scale Generalizable Audio Language Model
SpeechVerse: A Large-scale Generalizable Audio Language Model
Nilaksh Das
Saket Dingliwal
S. Ronanki
Rohit Paturi
David Huang
...
Monica Sunkara
S. Srinivasan
Kyu J. Han
Katrin Kirchhoff
Katrin Kirchhoff
39
37
0
14 May 2024
Lumina-T2X: Transforming Text into Any Modality, Resolution, and
  Duration via Flow-based Large Diffusion Transformers
Lumina-T2X: Transforming Text into Any Modality, Resolution, and Duration via Flow-based Large Diffusion Transformers
Peng Gao
Le Zhuo
Ziyi Lin
Ruoyi Du
Xu Luo
...
Weicai Ye
He Tong
Jingwen He
Yu Qiao
Hongsheng Li
VGen
30
81
0
09 May 2024
The Codecfake Dataset and Countermeasures for the Universally Detection
  of Deepfake Audio
The Codecfake Dataset and Countermeasures for the Universally Detection of Deepfake Audio
Yuankun Xie
Yi Lu
Ruibo Fu
Zhengqi Wen
Zhiyong Wang
...
Xiaopeng Wang
Yukun Liu
Haonan Cheng
Long Ye
Yi Sun
36
14
0
08 May 2024
Adapting WavLM for Speech Emotion Recognition
Adapting WavLM for Speech Emotion Recognition
Daria Diatlova
Anton Udalov
Vitalii Shutov
Egor Spirin
28
4
0
07 May 2024
HAFFormer: A Hierarchical Attention-Free Framework for Alzheimer's
  Disease Detection From Spontaneous Speech
HAFFormer: A Hierarchical Attention-Free Framework for Alzheimer's Disease Detection From Spontaneous Speech
Zhongren Dong
Zixing Zhang
Weixiang Xu
Jing Han
Jianjun Ou
Björn W. Schuller
35
1
0
07 May 2024
MMGER: Multi-modal and Multi-granularity Generative Error Correction
  with LLM for Joint Accent and Speech Recognition
MMGER: Multi-modal and Multi-granularity Generative Error Correction with LLM for Joint Accent and Speech Recognition
Bingshen Mu
Yangze Li
Qijie Shao
Kun Wei
Xucheng Wan
Naijun Zheng
Huan Zhou
Lei Xie
40
5
0
06 May 2024
Training-Free Deepfake Voice Recognition by Leveraging Large-Scale
  Pre-Trained Models
Training-Free Deepfake Voice Recognition by Leveraging Large-Scale Pre-Trained Models
Alessandro Pianese
D. Cozzolino
Giovanni Poggi
L. Verdoliva
19
5
0
03 May 2024
GMP-TL: Gender-augmented Multi-scale Pseudo-label Enhanced Transfer
  Learning for Speech Emotion Recognition
GMP-TL: Gender-augmented Multi-scale Pseudo-label Enhanced Transfer Learning for Speech Emotion Recognition
Yu Pan
Yuguang Yang
Heng Lu
Lei Ma
Jianjun Zhao
37
1
0
03 May 2024
Deep Learning Models in Speech Recognition: Measuring GPU Energy
  Consumption, Impact of Noise and Model Quantization for Edge Deployment
Deep Learning Models in Speech Recognition: Measuring GPU Energy Consumption, Impact of Noise and Model Quantization for Edge Deployment
Aditya Chakravarty
21
0
0
02 May 2024
Efficient Compression of Multitask Multilingual Speech Models
Efficient Compression of Multitask Multilingual Speech Models
Thomas Palmeira Ferraz
39
0
0
02 May 2024
Benchmarking Representations for Speech, Music, and Acoustic Events
Benchmarking Representations for Speech, Music, and Acoustic Events
Moreno La Quatra
Alkis Koudounas
Lorenzo Vaiani
Elena Baralis
Luca Cagliero
Paolo Garza
Sabato Marco Siniscalchi
24
10
0
02 May 2024
Previous
123...8910...192021
Next