Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2110.13900
Cited By
WavLM: Large-Scale Self-Supervised Pre-Training for Full Stack Speech Processing
26 October 2021
Sanyuan Chen
Chengyi Wang
Zhengyang Chen
Yu-Huan Wu
Shujie Liu
Zhuo Chen
Jinyu Li
Naoyuki Kanda
Takuya Yoshioka
Xiong Xiao
Jian Wu
Long Zhou
Shuo Ren
Y. Qian
Yao Qian
Jian Wu
Micheal Zeng
Xiangzhan Yu
Furu Wei
SSL
Re-assign community
ArXiv
PDF
HTML
Papers citing
"WavLM: Large-Scale Self-Supervised Pre-Training for Full Stack Speech Processing"
50 / 1,020 papers shown
Title
Exploring Effective Distillation of Self-Supervised Speech Models for Automatic Speech Recognition
Yujin Wang
Changli Tang
Ziyang Ma
Zhisheng Zheng
Xie Chen
Weiqiang Zhang
14
1
0
27 Oct 2022
Virtuoso: Massive Multilingual Speech-Text Joint Semi-Supervised Learning for Text-To-Speech
Takaaki Saeki
Heiga Zen
Zhehuai Chen
Nobuyuki Morioka
Gary Wang
Yu Zhang
Ankur Bapna
Andrew Rosenberg
Bhuvana Ramabhadran
58
19
0
27 Oct 2022
FreeVC: Towards High-Quality Text-Free One-Shot Voice Conversion
Jingyi Li
Weiping Tu
Li Xiao
41
91
0
27 Oct 2022
Robust Data2vec: Noise-robust Speech Representation Learning for ASR by Combining Regression and Improved Contrastive Learning
Qiu-shi Zhu
Long Zhou
Jie M. Zhang
Shujie Liu
Yu-Chen Hu
Lirong Dai
VLM
SSL
40
36
0
27 Oct 2022
UFO2: A unified pre-training framework for online and offline speech recognition
Li Fu
Siqi Li
Qingtao Li
L. Deng
Fangzhu Li
Lu Fan
Meng Chen
Xiaodong He
OffRL
8
7
0
26 Oct 2022
AVES: Animal Vocalization Encoder based on Self-Supervision
Masato Hagiwara
CLIP
VLM
AI4TS
11
23
0
26 Oct 2022
Real-time Speech Interruption Analysis: From Cloud to Client Deployment
Quchen Fu
Szu-Wei Fu
Yaran Fan
Yu-Huan Wu
Zhuo Chen
J. Gupchup
Ross Cutler
18
0
0
24 Oct 2022
Bootstrapping meaning through listening: Unsupervised learning of spoken sentence embeddings
Jian Zhu
Zuoyu Tian
Yadong Liu
Cong Zhang
Chia-wen Lo
SSL
22
2
0
23 Oct 2022
Evidence of Vocal Tract Articulation in Self-Supervised Learning of Speech
Cheol Jun Cho
Peter Wu
Abdel-rahman Mohamed
Gopala K. Anumanchipalli
19
29
0
21 Oct 2022
Large-scale learning of generalised representations for speaker recognition
Jee-weon Jung
Hee-Soo Heo
Bong-Jin Lee
Jaesong Lee
Hye-jin Shim
Youngki Kwon
Joon Son Chung
Shinji Watanabe
CVBM
10
6
0
20 Oct 2022
End-to-End Integration of Speech Recognition, Dereverberation, Beamforming, and Self-Supervised Learning Representation
Yoshiki Masuyama
Xuankai Chang
Samuele Cornell
Shinji Watanabe
Nobutaka Ono
8
19
0
19 Oct 2022
SVLDL: Improved Speaker Age Estimation Using Selective Variance Label Distribution Learning
Zuheng Kang
Jianzong Wang
Junqing Peng
Jing Xiao
14
3
0
18 Oct 2022
SUPERB @ SLT 2022: Challenge on Generalization and Efficiency of Self-Supervised Speech Representation Learning
Tzu-hsun Feng
Annie Dong
Ching-Feng Yeh
Shu-Wen Yang
Tzu-Quan Lin
...
Xuankai Chang
Shinji Watanabe
Abdel-rahman Mohamed
Shang-Wen Li
Hung-yi Lee
ELM
SSL
11
33
0
16 Oct 2022
CTCBERT: Advancing Hidden-unit BERT with CTC Objectives
Ruchao Fan
Yiming Wang
Yashesh Gaur
Jinyu Li
22
7
0
16 Oct 2022
Extracting speaker and emotion information from self-supervised speech models via channel-wise correlations
Themos Stafylakis
Ladislav Mošner
Sofoklis Kakouros
Oldrich Plchot
L. Burget
J. Černocký
SSL
24
8
0
15 Oct 2022
Learning to Jointly Transcribe and Subtitle for End-to-End Spontaneous Speech Recognition
Jakob Poncelet
Hugo Van hamme
10
1
0
14 Oct 2022
TransFusion: Transcribing Speech with Multinomial Diffusion
Matthew Baas
Kevin Eloff
Herman Kamper
DiffM
6
4
0
14 Oct 2022
Training speech emotion classifier without categorical annotations
Meysam Shamsi
Marie Tahon
10
2
0
14 Oct 2022
Experiments on Turkish ASR with Self-Supervised Speech Representation Learning
Ali Safaya
E. Erzin
11
1
0
13 Oct 2022
On Compressing Sequences for Self-Supervised Speech Models
Yen Meng
Hsuan-Jui Chen
Jiatong Shi
Shinji Watanabe
Paola García
Hung-yi Lee
Hao Tang
SSL
6
14
0
13 Oct 2022
On the Utility of Self-supervised Models for Prosody-related Tasks
Guan-Ting Lin
Chiyu Feng
Wei-Ping Huang
Yuan Tseng
Tzu-Han Lin
Chen An Li
Hung-yi Lee
Nigel G. Ward
9
47
0
13 Oct 2022
On the Use of Semantically-Aligned Speech Representations for Spoken Language Understanding
G. Laperriere
Valentin Pelloin
Mickael Rouvier
Themos Stafylakis
Yannick Esteve
22
9
0
11 Oct 2022
CoBERT: Self-Supervised Speech Representation Learning Through Code Representation Learning
Chutong Meng
Junyi Ao
Tom Ko
Mingxuan Wang
Haizhou Li
SSL
27
6
0
08 Oct 2022
SpeechUT: Bridging Speech and Text with Hidden-Unit for Encoder-Decoder Based Speech-Text Pre-training
Zi-Hua Zhang
Long Zhou
Junyi Ao
Shujie Liu
Lirong Dai
Jinyu Li
Furu Wei
61
57
0
07 Oct 2022
CCC-wav2vec 2.0: Clustering aided Cross Contrastive Self-supervised learning of speech representations
Vasista Sai Lodagala
Sreyan Ghosh
S. Umesh
SSL
32
18
0
05 Oct 2022
Improving Label-Deficient Keyword Spotting Through Self-Supervised Pretraining
H. S. Bovbjerg
Z. Tan
VLM
17
3
0
04 Oct 2022
SpeechCLIP: Integrating Speech with Pre-Trained Vision and Language Model
Yi-Jen Shih
Hsuan-Fu Wang
Heng-Jui Chang
Layne Berry
Hung-yi Lee
David F. Harwath
VLM
CLIP
38
32
0
03 Oct 2022
Match to Win: Analysing Sequences Lengths for Efficient Self-supervised Learning in Speech and Audio
Yan Gao
Javier Fernandez-Marques
Titouan Parcollet
Pedro Porto Buarque de Gusmão
Nicholas D. Lane
10
9
0
30 Sep 2022
Augmentation Invariant Discrete Representation for Generative Spoken Language Modeling
Itai Gat
Felix Kreuk
Tu Nguyen
Ann Lee
Jade Copet
Gabriel Synnaeve
Emmanuel Dupoux
Yossi Adi
18
11
0
30 Sep 2022
AudioGen: Textually Guided Audio Generation
Felix Kreuk
Gabriel Synnaeve
Adam Polyak
Uriel Singer
Alexandre Défossez
Jade Copet
Devi Parikh
Yaniv Taigman
Yossi Adi
DiffM
17
207
0
30 Sep 2022
SpeechLM: Enhanced Speech Pre-Training with Unpaired Textual Data
Zi-Hua Zhang
Sanyuan Chen
Long Zhou
Yu Wu
Shuo Ren
...
Zhuoyuan Yao
Xun Gong
Lirong Dai
Jinyu Li
Furu Wei
17
54
0
30 Sep 2022
Speech Enhancement Using Self-Supervised Pre-Trained Model and Vector Quantization
Xiaokang Zhao
Qiu-shi Zhu
Jie M. Zhang
12
4
0
28 Sep 2022
MeWEHV: Mel and Wave Embeddings for Human Voice Tasks
Andrés Vasco-Carofilis
Laura Fernández-Robles
Enrique Alegre
Eduardo FIDALGO
38
1
0
28 Sep 2022
The Kriston AI System for the VoxCeleb Speaker Recognition Challenge 2022
Qutang Cai
Guoqiang Hong
Zhijian Ye
Ximin Li
Haizhou Li
25
7
0
23 Sep 2022
The Microsoft System for VoxCeleb Speaker Recognition Challenge 2022
Gang Liu
Tianyan Zhou
Yong Zhao
Yu Wu
Zhuo Chen
Yao Qian
Jian Wu
9
1
0
22 Sep 2022
Watch What You Pretrain For: Targeted, Transferable Adversarial Examples on Self-Supervised Speech Recognition models
R. Olivier
H. Abdullah
Bhiksha Raj
AAML
11
1
0
17 Sep 2022
Overlapped speech and gender detection with WavLM pre-trained features
Martin Lebourdais
Marie Tahon
Antoine Laurent
S. Meignier
19
17
0
09 Sep 2022
Target Speaker Voice Activity Detection with Transformers and Its Integration with End-to-End Neural Diarization
Dongmei Wang
Xiong Xiao
Naoyuki Kanda
Takuya Yoshioka
Jian Wu
10
25
0
27 Aug 2022
The ReprGesture entry to the GENEA Challenge 2022
Sicheng Yang
Zhiyong Wu
Minglei Li
Mengchen Zhao
Jiuxin Lin
Liyang Chen
Weihong Bao
17
11
0
25 Aug 2022
3M: An Effective Multi-view, Multi-granularity, and Multi-aspect Modeling Approach to English Pronunciation Assessment
Fu-An Chao
Tien-Hong Lo
Tzu-I Wu
Yao-Ting Sung
Berlin Chen
18
40
0
19 Aug 2022
C3-DINO: Joint Contrastive and Non-contrastive Self-Supervised Learning for Speaker Verification
Chunlei Zhang
Dong Yu
14
17
0
15 Aug 2022
Utterance-by-utterance overlap-aware neural diarization with Graph-PIT
K. Kinoshita
Thilo von Neumann
Marc Delcroix
Christoph Boeddeker
Reinhold Haeb-Umbach
18
4
0
28 Jul 2022
Dive into Big Model Training
Qinghua Liu
Yuxiang Jiang
MoMe
AI4CE
LRM
13
3
0
25 Jul 2022
ILASR: Privacy-Preserving Incremental Learning for Automatic Speech Recognition at Production Scale
Gopinath Chennupati
Milind Rao
Gurpreet Chadha
Aaron Eakin
A. Raju
...
Andrew Oberlin
Buddha Nandanoor
Prahalad Venkataramanan
Zheng Wu
Pankaj Sitpure
CLL
6
8
0
19 Jul 2022
Deep versus Wide: An Analysis of Student Architectures for Task-Agnostic Knowledge Distillation of Self-Supervised Speech Models
Takanori Ashihara
Takafumi Moriya
Kohei Matsuura
Tomohiro Tanaka
9
25
0
14 Jul 2022
Tandem Multitask Training of Speaker Diarisation and Speech Recognition for Meeting Transcription
Xianrui Zheng
C. Zhang
P. Woodland
21
16
0
08 Jul 2022
Comparing supervised and self-supervised embedding for ExVo Multi-Task learning track
Tilak Purohit
Imen Ben Mahmoud
Bogdan Vlasenko
Mathew Magimai.-Doss
SSL
15
8
0
23 Jun 2022
A Systematic Comparison of Phonetic Aware Techniques for Speech Enhancement
Or Tal
Moshe Mandel
Felix Kreuk
Yossi Adi
AAML
4
8
0
22 Jun 2022
Supervision-Guided Codebooks for Masked Prediction in Speech Pre-training
Chengyi Wang
Yiming Wang
Yu Wu
Sanyuan Chen
Jinyu Li
Shujie Liu
Furu Wei
SSL
17
18
0
21 Jun 2022
DRAFT: A Novel Framework to Reduce Domain Shifting in Self-supervised Learning and Its Application to Children's ASR
Ruchao Fan
Abeer Alwan
11
30
0
16 Jun 2022
Previous
1
2
3
...
18
19
20
21
Next