Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2110.13900
Cited By
WavLM: Large-Scale Self-Supervised Pre-Training for Full Stack Speech Processing
26 October 2021
Sanyuan Chen
Chengyi Wang
Zhengyang Chen
Yu-Huan Wu
Shujie Liu
Zhuo Chen
Jinyu Li
Naoyuki Kanda
Takuya Yoshioka
Xiong Xiao
Jian Wu
Long Zhou
Shuo Ren
Y. Qian
Yao Qian
Jian Wu
Micheal Zeng
Xiangzhan Yu
Furu Wei
SSL
Re-assign community
ArXiv
PDF
HTML
Papers citing
"WavLM: Large-Scale Self-Supervised Pre-Training for Full Stack Speech Processing"
50 / 1,021 papers shown
Title
DeCoR: Defy Knowledge Forgetting by Predicting Earlier Audio Codes
Xilin Jiang
Yinghao Aaron Li
N. Mesgarani
CLL
11
1
0
29 May 2023
Investigating Pre-trained Audio Encoders in the Low-Resource Condition
Haomiao Yang
Jinming Zhao
Gholamreza Haffari
Ehsan Shareghi
14
6
0
28 May 2023
DPHuBERT: Joint Distillation and Pruning of Self-Supervised Speech Models
Yifan Peng
Yui Sudo
Muhammad Shakeel
Shinji Watanabe
11
37
0
28 May 2023
One-Step Knowledge Distillation and Fine-Tuning in Using Large Pre-Trained Self-Supervised Learning Models for Speaker Verification
Ju-Sung Heo
Chan-yeong Lim
Ju-ho Kim
Hyun-Seo Shin
Ha-Jin Yu
6
2
0
27 May 2023
Inter-connection: Effective Connection between Pre-trained Encoder and Decoder for Speech Translation
Yuta Nishikawa
Satoshi Nakamura
22
4
0
26 May 2023
Automatic Tuning of Loss Trade-offs without Hyper-parameter Search in End-to-End Zero-Shot Speech Synthesis
Seong-Hyun Park
Bohyung Kim
Tae-Hyun Oh
19
1
0
26 May 2023
Weakly-Supervised Speech Pre-training: A Case Study on Target Speech Recognition
Wangyou Zhang
Y. Qian
33
10
0
25 May 2023
VioLA: Unified Codec Language Models for Speech Recognition, Synthesis, and Translation
Tianrui Wang
Long Zhou
Zi-Hua Zhang
Yu-Huan Wu
Shujie Liu
Yashesh Gaur
Zhuo Chen
Jinyu Li
Furu Wei
32
100
0
25 May 2023
Spoofing Attacker Also Benefits from Self-Supervised Pretrained Model
Aoi Ito
Shota Horiguchi
SSL
6
2
0
24 May 2023
LAraBench: Benchmarking Arabic AI with Large Language Models
Ahmed Abdelali
Hamdy Mubarak
Shammur A. Chowdhury
Maram Hasanain
Basel Mousi
...
Yousseif Elshahawy
Ahmed M. Ali
Nadir Durrani
Natasa Milic-Frayling
Firoj Alam
ELM
LM&MA
6
18
0
24 May 2023
Downstream Task Agnostic Speech Enhancement with Self-Supervised Representation Loss
Hiroshi Sato
Ryo Masumura
Tsubasa Ochiai
Marc Delcroix
Takafumi Moriya
...
Kentaro Shinayama
Saki Mizuno
Mana Ihori
Tomohiro Tanaka
Nobukatsu Hojo
18
5
0
24 May 2023
On the Transferability of Whisper-based Representations for "In-the-Wild" Cross-Task Downstream Speech Applications
Vamsikrishna Chemudupati
Marzieh S. Tahaei
Heitor R. Guimarães
Arthur Pimentel
Anderson R. Avila
Mehdi Rezagholizadeh
Boxing Chen
Tiago H. Falk
SSL
29
6
0
23 May 2023
Understanding Spoken Language Development of Children with ASD Using Pre-trained Speech Embeddings
Anfeng Xu
Rajat Hebbar
Rimita Lahiri
Tiantian Feng
Lindsay K. Butler
Lue Shen
Helen Tager-Flusberg
Shrikanth Narayanan
11
10
0
23 May 2023
Masked Modeling Duo for Speech: Specializing General-Purpose Audio Representation to Speech using Denoising Distillation
Daisuke Niizumi
Daiki Takeuchi
Yasunori Ohishi
Noboru Harada
K. Kashino
37
3
0
23 May 2023
Improving speech translation by fusing speech and text
Wenbiao Yin
Zhicheng Liu
Chengqi Zhao
Tao Wang
Jian-Fei Tong
Rong Ye
13
4
0
23 May 2023
Can Self-Supervised Neural Representations Pre-Trained on Human Speech distinguish Animal Callers?
Eklavya Sarkar
Mathew Magimai.-Doss
8
11
0
23 May 2023
i-Code Studio: A Configurable and Composable Framework for Integrative AI
Yuwei Fang
Mahmoud Khademi
Chenguang Zhu
Ziyi Yang
Reid Pryzant
...
Yao Qian
Takuya Yoshioka
Lu Yuan
Michael Zeng
Xuedong Huang
25
2
0
23 May 2023
Cross-Attention is Not Enough: Incongruity-Aware Dynamic Hierarchical Fusion for Multimodal Affect Recognition
Yaoting Wang
Yuanchao Li
Paul Pu Liang
Louis-Philippe Morency
P. Bell
Catherine Lai
CVBM
21
8
0
23 May 2023
Multi-Stream Extension of Variational Bayesian HMM Clustering (MS-VBx) for Combined End-to-End and Vector Clustering-based Diarization
Marc Delcroix
Naohiro Tawara
Mireia Díez
Federico Landini
Anna Silnova
A. Ogawa
Tomohiro Nakatani
L. Burget
S. Araki
24
5
0
23 May 2023
SpokenWOZ: A Large-Scale Speech-Text Benchmark for Spoken Task-Oriented Dialogue Agents
Shuzheng Si
Wen-Cheng Ma
Haoyu Gao
Yuchuan Wu
Ting-En Lin
Yinpei Dai
Hangyu Li
Rui Yan
Fei Huang
Yongbin Li
AuLLM
24
27
0
22 May 2023
The defender's perspective on automatic speaker verification: An overview
Haibin Wu
Jiawen Kang
Lingwei Meng
H. Meng
Hung-yi Lee
AAML
17
14
0
22 May 2023
i-Code V2: An Autoregressive Generation Framework over Vision, Language, and Speech Data
Ziyi Yang
Mahmoud Khademi
Yichong Xu
Reid Pryzant
Yuwei Fang
...
Yu Shi
Lu Yuan
Takuya Yoshioka
Michael Zeng
Xuedong Huang
17
2
0
21 May 2023
Self-supervised representations in speech-based depression detection
Wen Wu
C. Zhang
P. Woodland
11
23
0
20 May 2023
North Sámi Dialect Identification with Self-supervised Speech Models
Sofoklis Kakouros
Katri Hiovain-Asikainen
17
4
0
19 May 2023
Scaling laws for language encoding models in fMRI
Richard Antonello
Aditya R. Vaidya
Alexander G. Huth
MedIm
14
55
0
19 May 2023
A New Benchmark of Aphasia Speech Recognition and Detection Based on E-Branchformer and Multi-task Learning
Jiyang Tang
William Chen
Xuankai Chang
Shinji Watanabe
B. MacWhinney
11
10
0
19 May 2023
Recycle-and-Distill: Universal Compression Strategy for Transformer-based Speech SSL Models with Attention Map Reusing and Masking Distillation
Kangwook Jang
Sungnyun Kim
Se-Young Yun
Hoi-Rim Kim
16
5
0
19 May 2023
Speech-Text Dialog Pre-training for Spoken Dialog Understanding with Explicit Cross-Modal Alignment
Tianshu Yu
Haoyu Gao
Ting-En Lin
Min Yang
Yuchuan Wu
Wen-Cheng Ma
Chao Wang
Fei Huang
Yongbin Li
19
18
0
19 May 2023
Phonetic and Prosody-aware Self-supervised Learning Approach for Non-native Fluency Scoring
Kaiqi Fu
Shaojun Gao
Shuju Shi
Xiaohai Tian
Wei Li
Zejun Ma
18
2
0
19 May 2023
Syllable Discovery and Cross-Lingual Generalization in a Visually Grounded, Self-Supervised Speech Model
Puyuan Peng
Shang-Wen Li
Okko Rasanen
Abdel-rahman Mohamed
David F. Harwath
SSL
VLM
18
7
0
19 May 2023
DUB: Discrete Unit Back-translation for Speech Translation
Dong Zhang
Rong Ye
Tom Ko
Mingxuan Wang
Yaqian Zhou
11
22
0
19 May 2023
TrustSER: On the Trustworthiness of Fine-tuning Pre-trained Speech Embeddings For Speech Emotion Recognition
Tiantian Feng
Rajat Hebbar
Shrikanth Narayanan
25
7
0
18 May 2023
ONE-PEACE: Exploring One General Representation Model Toward Unlimited Modalities
Peng Wang
Shijie Wang
Junyang Lin
Shuai Bai
Xiaohuan Zhou
Jingren Zhou
Xinggang Wang
Chang Zhou
VLM
MLLM
ObjD
16
113
0
18 May 2023
QPGesture: Quantization-Based and Phase-Guided Motion Matching for Natural Speech-Driven Gesture Generation
Sicheng Yang
Zhiyong Wu
Minglei Li
Zhensong Zhang
Lei Hao
Weihong Bao
Hao-Wen Zhuang
SLR
16
40
0
18 May 2023
A Comparative Study on E-Branchformer vs Conformer in Speech Recognition, Translation, and Understanding Tasks
Yifan Peng
Kwangyoun Kim
Felix Wu
Brian Yan
Siddhant Arora
William Chen
Jiyang Tang
Suwon Shon
Prashant Sridhar
Shinji Watanabe
19
17
0
18 May 2023
Self-supervised Fine-tuning for Improved Content Representations by Speaker-invariant Clustering
Heng-Jui Chang
Alexander H. Liu
James R. Glass
SSL
17
20
0
18 May 2023
ML-SUPERB: Multilingual Speech Universal PERformance Benchmark
Jiatong Shi
Dan Berrebbi
William Chen
Ho-Lam Chung
En-Pei Hu
...
Xuankai Chang
Shang-Wen Li
Abdel-rahman Mohamed
Hung-yi Lee
Shinji Watanabe
ELM
47
58
0
18 May 2023
DinoSR: Self-Distillation and Online Clustering for Self-supervised Speech Representation Learning
Alexander H. Liu
Heng-Jui Chang
Michael Auli
Wei-Ning Hsu
James R. Glass
16
24
0
17 May 2023
SoundStorm: Efficient Parallel Audio Generation
Zalan Borsos
Matthew Sharifi
Damien Vincent
Eugene Kharitonov
Neil Zeghidour
Marco Tagliasacchi
15
97
0
16 May 2023
Laughing Matters: Introducing Laughing-Face Generation using Diffusion Models
Antoni Bigata Casademunt
Rodrigo Mira
Nikita Drobyshev
Konstantinos Vougioukas
Stavros Petridis
M. Pantic
DiffM
56
1
0
15 May 2023
Self-supervised Neural Factor Analysis for Disentangling Utterance-level Speech Representations
Wei-wei Lin
Chenhang He
Man-Wai Mak
Youzhi Tu
19
5
0
14 May 2023
Masked Audio Text Encoders are Effective Multi-Modal Rescorers
Jason (Jinglun) Cai
Monica Sunkara
Xilai Li
Anshu Bhatia
Xiao Pan
S. Bodapati
15
3
0
11 May 2023
Extending Audio Masked Autoencoders Toward Audio Restoration
Zhi-Wei Zhong
Hao Shi
M. Hirano
Kazuki Shimada
Kazuya Tateishi
Takashi Shibuya
Shusuke Takahashi
Yuki Mitsufuji
13
4
0
11 May 2023
An Exploration into the Performance of Unsupervised Cross-Task Speech Representations for "In the Wild'' Edge Applications
Heitor R. Guimarães
Arthur Pimentel
Anderson R. Avila
Mehdi Rezagholizadeh
Tiago H. Falk
SSL
6
2
0
09 May 2023
Exploration of Language Dependency for Japanese Self-Supervised Speech Representation Models
Takanori Ashihara
Takafumi Moriya
Kohei Matsuura
Tomohiro Tanaka
14
3
0
09 May 2023
LABO: Towards Learning Optimal Label Regularization via Bi-level Optimization
Peng Lu
Ahmad Rashid
I. Kobyzev
Mehdi Rezagholizadeh
Philippe Langlais
6
0
0
08 May 2023
DiffuseStyleGesture: Stylized Audio-Driven Co-Speech Gesture Generation with Diffusion Models
Sicheng Yang
Zhiyong Wu
Minglei Li
Zhensong Zhang
Lei Hao
Weihong Bao
Ming Cheng
Long Xiao
22
64
0
08 May 2023
Improved Vocal Effort Transfer Vector Estimation for Vocal Effort-Robust Speaker Verification
Iván López-Espejo
Santi Prieto
Alfonso Ortega
EDUARDO LLEIDA SOLANO
9
0
0
03 May 2023
Self-supervised learning for infant cry analysis
Arsenii Gorin
Cem Subakan
Sajjad Abdoli
Junhao Wang
Samantha Latremouille
Charles C. Onu
25
9
0
02 May 2023
Lessons Learned in ATCO2: 5000 hours of Air Traffic Control Communications for Robust Automatic Speech Recognition and Understanding
Juan Pablo Zuluaga
Iuliia Nigmatulina
Amrutha Prasad
P. Motlícek
Driss Khalil
...
Allan Tart
Igor Szöke
Vincent Lenders
M. Rigault
K. Choukri
11
2
0
02 May 2023
Previous
1
2
3
...
15
16
17
...
19
20
21
Next