ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2110.13900
  4. Cited By
WavLM: Large-Scale Self-Supervised Pre-Training for Full Stack Speech
  Processing

WavLM: Large-Scale Self-Supervised Pre-Training for Full Stack Speech Processing

26 October 2021
Sanyuan Chen
Chengyi Wang
Zhengyang Chen
Yu-Huan Wu
Shujie Liu
Zhuo Chen
Jinyu Li
Naoyuki Kanda
Takuya Yoshioka
Xiong Xiao
Jian Wu
Long Zhou
Shuo Ren
Y. Qian
Yao Qian
Jian Wu
Micheal Zeng
Xiangzhan Yu
Furu Wei
    SSL
ArXivPDFHTML

Papers citing "WavLM: Large-Scale Self-Supervised Pre-Training for Full Stack Speech Processing"

50 / 1,021 papers shown
Title
Learning Expressive Disentangled Speech Representations with Soft Speech
  Units and Adversarial Style Augmentation
Learning Expressive Disentangled Speech Representations with Soft Speech Units and Adversarial Style Augmentation
Yimin Deng
Jianzong Wang
Xulong Zhang
Ning Cheng
Jing Xiao
16
0
0
01 May 2024
Self-supervised Pre-training of Text Recognizers
Self-supervised Pre-training of Text Recognizers
M. Kišš
Michal Hradiš
SSL
32
1
0
01 May 2024
Bridge to Non-Barrier Communication: Gloss-Prompted Fine-grained Cued
  Speech Gesture Generation with Diffusion Model
Bridge to Non-Barrier Communication: Gloss-Prompted Fine-grained Cued Speech Gesture Generation with Diffusion Model
Wen-Ling Lei
Li Liu
Jun Wang
DiffM
27
2
0
30 Apr 2024
TI-ASU: Toward Robust Automatic Speech Understanding through
  Text-to-speech Imputation Against Missing Speech Modality
TI-ASU: Toward Robust Automatic Speech Understanding through Text-to-speech Imputation Against Missing Speech Modality
Tiantian Feng
Xuan Shi
Rahul Gupta
Shrikanth S. Narayanan
41
0
0
27 Apr 2024
HybridVC: Efficient Voice Style Conversion with Text and Audio Prompts
HybridVC: Efficient Voice Style Conversion with Text and Audio Prompts
Xinlei Niu
Jing Zhang
Charles Patrick Martin
18
2
0
24 Apr 2024
Rethinking Processing Distortions: Disentangling the Impact of Speech
  Enhancement Errors on Speech Recognition Performance
Rethinking Processing Distortions: Disentangling the Impact of Speech Enhancement Errors on Speech Recognition Performance
Tsubasa Ochiai
Kazuma Iwamoto
Marc Delcroix
Rintaro Ikeshita
Hiroshi Sato
Shoko Araki
Shigeru Katagiri
16
2
0
23 Apr 2024
FlashSpeech: Efficient Zero-Shot Speech Synthesis
FlashSpeech: Efficient Zero-Shot Speech Synthesis
Zhen Ye
Zeqian Ju
Haohe Liu
Xu Tan
Jianyi Chen
...
Weizhen Bian
Shulin He
Qi-fei Liu
Yi-Ting Guo
Wei Xue
35
16
0
23 Apr 2024
Retrieval-Augmented Audio Deepfake Detection
Retrieval-Augmented Audio Deepfake Detection
Zuheng Kang
Yayun He
Botao Zhao
Xiaoyang Qu
Junqing Peng
Jing Xiao
Jianzong Wang
22
7
0
22 Apr 2024
MAD Speech: Measures of Acoustic Diversity of Speech
MAD Speech: Measures of Acoustic Diversity of Speech
Matthieu Futeral
A. Agostinelli
Marco Tagliasacchi
Neil Zeghidour
Eugene Kharitonov
46
1
0
16 Apr 2024
A Large-Scale Evaluation of Speech Foundation Models
A Large-Scale Evaluation of Speech Foundation Models
Shu-Wen Yang
Heng-Jui Chang
Zili Huang
Andy T. Liu
Cheng-I Jeff Lai
...
Kushal Lakhotia
Shang-Wen Li
Abdelrahman Mohamed
Shinji Watanabe
Hung-yi Lee
38
19
0
15 Apr 2024
MMA-DFER: MultiModal Adaptation of unimodal models for Dynamic Facial
  Expression Recognition in-the-wild
MMA-DFER: MultiModal Adaptation of unimodal models for Dynamic Facial Expression Recognition in-the-wild
K. Chumachenko
Alexandros Iosifidis
M. Gabbouj
19
6
0
13 Apr 2024
Voice Attribute Editing with Text Prompt
Voice Attribute Editing with Text Prompt
Zheng-Yan Sheng
Yang Ai
Li-Juan Liu
Jia Pan
Zhenhua Ling
26
4
0
13 Apr 2024
CoVoMix: Advancing Zero-Shot Speech Generation for Human-like
  Multi-talker Conversations
CoVoMix: Advancing Zero-Shot Speech Generation for Human-like Multi-talker Conversations
Leying Zhang
Yao Qian
Long Zhou
Shujie Liu
Dongmei Wang
...
Yanmin Qian
Jinyu Li
Lei He
Sheng Zhao
Michael Zeng
26
1
0
10 Apr 2024
VoiceShop: A Unified Speech-to-Speech Framework for Identity-Preserving
  Zero-Shot Voice Editing
VoiceShop: A Unified Speech-to-Speech Framework for Identity-Preserving Zero-Shot Voice Editing
Philip Anastassiou
Zhenyu Tang
Kainan Peng
Dongya Jia
Jiaxin Li
Ming Tu
Yuping Wang
Yuxuan Wang
Mingbo Ma
37
4
0
10 Apr 2024
Masked Modeling Duo: Towards a Universal Audio Pre-training Framework
Masked Modeling Duo: Towards a Universal Audio Pre-training Framework
Daisuke Niizumi
Daiki Takeuchi
Yasunori Ohishi
Noboru Harada
K. Kashino
29
10
0
09 Apr 2024
The X-LANCE Technical Report for Interspeech 2024 Speech Processing
  Using Discrete Speech Unit Challenge
The X-LANCE Technical Report for Interspeech 2024 Speech Processing Using Discrete Speech Unit Challenge
Yiwei Guo
Chenrun Wang
Yifan Yang
Hankun Wang
Ziyang Ma
...
Hanzheng Li
Shuai Fan
Hui Zhang
Xie Chen
Kai Yu
28
1
0
09 Apr 2024
Test-Time Training for Depression Detection
Test-Time Training for Depression Detection
Sri Harsha Dumpala
Chandramouli Shama Sastry
Rudolf Uher
Sageev Oore
43
0
0
07 Apr 2024
RALL-E: Robust Codec Language Modeling with Chain-of-Thought Prompting
  for Text-to-Speech Synthesis
RALL-E: Robust Codec Language Modeling with Chain-of-Thought Prompting for Text-to-Speech Synthesis
Detai Xin
Xu Tan
Kai Shen
Zeqian Ju
Dongchao Yang
...
Shinnosuke Takamichi
Hiroshi Saruwatari
Shujie Liu
Jinyu Li
Sheng Zhao
29
23
0
04 Apr 2024
CLaM-TTS: Improving Neural Codec Language Model for Zero-Shot
  Text-to-Speech
CLaM-TTS: Improving Neural Codec Language Model for Zero-Shot Text-to-Speech
Jaehyeon Kim
Keon Lee
Seungjun Chung
Jaewoong Cho
65
39
0
03 Apr 2024
The VoicePrivacy 2024 Challenge Evaluation Plan
The VoicePrivacy 2024 Challenge Evaluation Plan
N. Tomashenko
Xiaoxiao Miao
Pierre Champion
Sarina Meyer
Xin Wang
Emmanuel Vincent
Michele Panariello
Nicholas W. D. Evans
Junichi Yamagishi
Massimiliano Todisco
28
20
0
03 Apr 2024
LastResort at SemEval-2024 Task 3: Exploring Multimodal Emotion Cause
  Pair Extraction as Sequence Labelling Task
LastResort at SemEval-2024 Task 3: Exploring Multimodal Emotion Cause Pair Extraction as Sequence Labelling Task
Suyash Vardhan Mathur
Akshett Rai Jindal
Hardik Mittal
Manish Shrivastava
28
1
0
02 Apr 2024
Co-Speech Gesture Video Generation via Motion-Decoupled Diffusion Model
Co-Speech Gesture Video Generation via Motion-Decoupled Diffusion Model
Xu He
Qiaochu Huang
Zhensong Zhang
Zhiwei Lin
Zhiyong Wu
Sicheng Yang
Minglei Li
Zhiyi Chen
Songcen Xu
Xiaofei Wu
24
15
0
02 Apr 2024
Transfer Learning from Whisper for Microscopic Intelligibility
  Prediction
Transfer Learning from Whisper for Microscopic Intelligibility Prediction
Paul Best
Santiago Cuervo
R. Marxer
22
1
0
02 Apr 2024
WavLLM: Towards Robust and Adaptive Speech Large Language Model
WavLLM: Towards Robust and Adaptive Speech Large Language Model
Shujie Hu
Long Zhou
Shujie Liu
Sanyuan Chen
Hongkun Hao
...
Xunying Liu
Jinyu Li
S. Sivasankaran
Linquan Liu
Furu Wei
AuLLM
21
42
0
31 Mar 2024
A Backdoor Approach with Inverted Labels Using Dirty Label-Flipping
  Attacks
A Backdoor Approach with Inverted Labels Using Dirty Label-Flipping Attacks
Orson Mengara
AAML
30
4
0
29 Mar 2024
VoiceCraft: Zero-Shot Speech Editing and Text-to-Speech in the Wild
VoiceCraft: Zero-Shot Speech Editing and Text-to-Speech in the Wild
Puyuan Peng
Po-Yao (Bernie) Huang
Daniel Li
Abdelrahman Mohamed
David F. Harwath
57
55
0
25 Mar 2024
Target Speech Extraction with Pre-trained AV-HuBERT and Mask-And-Recover
  Strategy
Target Speech Extraction with Pre-trained AV-HuBERT and Mask-And-Recover Strategy
Wenxuan Wu
Xueyuan Chen
Xixin Wu
Haizhou Li
Helen M. Meng
21
1
0
24 Mar 2024
Wav2Gloss: Generating Interlinear Glossed Text from Speech
Wav2Gloss: Generating Interlinear Glossed Text from Speech
Taiqi He
Kwanghee Choi
Lindia Tjuatja
Nathaniel R. Robinson
Jiatong Shi
Shinji Watanabe
Graham Neubig
David R. Mortensen
Lori S. Levin
VLM
30
2
0
19 Mar 2024
MSLM-S2ST: A Multitask Speech Language Model for Textless
  Speech-to-Speech Translation with Speaker Style Preservation
MSLM-S2ST: A Multitask Speech Language Model for Textless Speech-to-Speech Translation with Speaker Style Preservation
Yifan Peng
Ilia Kulikov
Yilin Yang
Sravya Popuri
Hui Lu
Changhan Wang
Hongyu Gong
28
4
0
19 Mar 2024
An Empirical Study of Speech Language Models for Prompt-Conditioned
  Speech Synthesis
An Empirical Study of Speech Language Models for Prompt-Conditioned Speech Synthesis
Yifan Peng
Ilia Kulikov
Yilin Yang
Sravya Popuri
Hui Lu
Changhan Wang
Hongyu Gong
22
1
0
19 Mar 2024
MIntRec2.0: A Large-scale Benchmark Dataset for Multimodal Intent
  Recognition and Out-of-scope Detection in Conversations
MIntRec2.0: A Large-scale Benchmark Dataset for Multimodal Intent Recognition and Out-of-scope Detection in Conversations
Hanlei Zhang
Xin Wang
Hua Xu
Qianrui Zhou
Kai Gao
Jianhua Su
jinyue Zhao
Wenrui Li
Yanting Chen
26
2
0
16 Mar 2024
ScanTalk: 3D Talking Heads from Unregistered Scans
ScanTalk: 3D Talking Heads from Unregistered Scans
Federico Nocentini
T. Besnier
Claudio Ferrari
Sylvain Arguillere
Stefano Berretti
Mohamed Daoudi
30
5
0
16 Mar 2024
Speech-driven Personalized Gesture Synthetics: Harnessing Automatic
  Fuzzy Feature Inference
Speech-driven Personalized Gesture Synthetics: Harnessing Automatic Fuzzy Feature Inference
Fan Zhang
Zhaohan Wang
Xin Lyu
Siyuan Zhao
Mengjian Li
...
Naye Ji
Hui Du
Fuxing Gao
Hao Wu
Shunman Li
VGen
38
3
0
16 Mar 2024
Improving Acoustic Word Embeddings through Correspondence Training of
  Self-supervised Speech Representations
Improving Acoustic Word Embeddings through Correspondence Training of Self-supervised Speech Representations
Amit Meghanani
Thomas Hain
SSL
35
1
0
13 Mar 2024
An Efficient End-to-End Approach to Noise Invariant Speech Features via
  Multi-Task Learning
An Efficient End-to-End Approach to Noise Invariant Speech Features via Multi-Task Learning
Heitor R. Guimarães
Arthur Pimentel
Anderson R. Avila
Mehdi Rezagholizadeh
Boxing Chen
Tiago H. Falk
54
1
0
13 Mar 2024
SpeechColab Leaderboard: An Open-Source Platform for Automatic Speech
  Recognition Evaluation
SpeechColab Leaderboard: An Open-Source Platform for Automatic Speech Recognition Evaluation
Jiayu Du
Jinpeng Li
Guoguo Chen
Wei-Qiang Zhang
ELM
24
3
0
13 Mar 2024
SCORE: Self-supervised Correspondence Fine-tuning for Improved Content
  Representations
SCORE: Self-supervised Correspondence Fine-tuning for Improved Content Representations
Amit Meghanani
Thomas Hain
33
3
0
10 Mar 2024
A robust audio deepfake detection system via multi-view feature
A robust audio deepfake detection system via multi-view feature
Yujie Yang
Haochen Qin
Hang Zhou
Chengcheng Wang
Tianyu Guo
Kai Han
Yunhe Wang
38
24
0
04 Mar 2024
IndicVoices: Towards building an Inclusive Multilingual Speech Dataset
  for Indian Languages
IndicVoices: Towards building an Inclusive Multilingual Speech Dataset for Indian Languages
Tahir Javed
J. Nawale
E. George
Sakshi Joshi
Kaushal Bhogale
...
M. ManickamK
C. V. Vaijayanthi
Krishnan Srinivasa Raghavan Karunganni
Pratyush Kumar
Mitesh M Khapra
23
16
0
04 Mar 2024
A Closer Look at Wav2Vec2 Embeddings for On-Device Single-Channel Speech
  Enhancement
A Closer Look at Wav2Vec2 Embeddings for On-Device Single-Channel Speech Enhancement
Ravi Shankar
Ke Tan
Buye Xu
Anurag Kumar
17
0
0
03 Mar 2024
Bespoke Non-Stationary Solvers for Fast Sampling of Diffusion and Flow
  Models
Bespoke Non-Stationary Solvers for Fast Sampling of Diffusion and Flow Models
Neta Shaul
Uriel Singer
Ricky T. Q. Chen
Matt Le
Ali K. Thabet
Albert Pumarola
Y. Lipman
DiffM
25
4
0
02 Mar 2024
Efficient Adapter Tuning of Pre-trained Speech Models for Automatic
  Speaker Verification
Efficient Adapter Tuning of Pre-trained Speech Models for Automatic Speaker Verification
Mufan Sang
John H. L. Hansen
25
6
0
01 Mar 2024
Compact Speech Translation Models via Discrete Speech Units Pretraining
Compact Speech Translation Models via Discrete Speech Units Pretraining
Tsz Kin Lam
Alexandra Birch
Barry Haddow
43
2
0
29 Feb 2024
Experimental Study: Enhancing Voice Spoofing Detection Models with
  wav2vec 2.0
Experimental Study: Enhancing Voice Spoofing Detection Models with wav2vec 2.0
Taein Kang
Soyul Han
Sunmook Choi
Jaejin Seo
Sanghyeok Chung
Seungeun Lee
Seungsang Oh
Il-Youp Kwak
41
7
0
27 Feb 2024
SKILL: Similarity-aware Knowledge distILLation for Speech
  Self-Supervised Learning
SKILL: Similarity-aware Knowledge distILLation for Speech Self-Supervised Learning
Luca Zampierin
G. B. Hacene
Bac Nguyen
Mirco Ravanelli
27
2
0
26 Feb 2024
The Effect of Batch Size on Contrastive Self-Supervised Speech
  Representation Learning
The Effect of Batch Size on Contrastive Self-Supervised Speech Representation Learning
Nik Vaessen
David A. van Leeuwen
25
3
0
21 Feb 2024
Codec-SUPERB: An In-Depth Analysis of Sound Codec Models
Codec-SUPERB: An In-Depth Analysis of Sound Codec Models
Haibin Wu
Ho-Lam Chung
Yi-Cheng Lin
Yuan-Kuei Wu
Xuanjun Chen
Yu-Chi Pai
Hsiu-Hsuan Wang
Kai-Wei Chang
Alexander H. Liu
Hung-yi Lee
34
18
0
20 Feb 2024
EMO-SUPERB: An In-depth Look at Speech Emotion Recognition
EMO-SUPERB: An In-depth Look at Speech Emotion Recognition
Haibin Wu
Huang-Cheng Chou
Kai-Wei Chang
Lucas Goncalves
Jiawei Du
Jyh-Shing Roger Jang
Chi-Chun Lee
Hung-Yi Lee
29
11
0
20 Feb 2024
Handling Ambiguity in Emotion: From Out-of-Domain Detection to
  Distribution Estimation
Handling Ambiguity in Emotion: From Out-of-Domain Detection to Distribution Estimation
Wen Wu
Bo-wen Li
C. Zhang
Chung-Cheng Chiu
Qiujia Li
Junwen Bai
Tara N. Sainath
P. Woodland
14
2
0
20 Feb 2024
StyleDubber: Towards Multi-Scale Style Learning for Movie Dubbing
StyleDubber: Towards Multi-Scale Style Learning for Movie Dubbing
Gaoxiang Cong
Yuankai Qi
Liang-Sheng Li
Amin Beheshti
Zhedong Zhang
A. Hengel
Ming-Hsuan Yang
Chenggang Yan
Qingming Huang
35
12
0
20 Feb 2024
Previous
123...91011...192021
Next