Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2110.13900
Cited By
WavLM: Large-Scale Self-Supervised Pre-Training for Full Stack Speech Processing
26 October 2021
Sanyuan Chen
Chengyi Wang
Zhengyang Chen
Yu-Huan Wu
Shujie Liu
Zhuo Chen
Jinyu Li
Naoyuki Kanda
Takuya Yoshioka
Xiong Xiao
Jian Wu
Long Zhou
Shuo Ren
Y. Qian
Yao Qian
Jian Wu
Micheal Zeng
Xiangzhan Yu
Furu Wei
SSL
Re-assign community
ArXiv
PDF
HTML
Papers citing
"WavLM: Large-Scale Self-Supervised Pre-Training for Full Stack Speech Processing"
50 / 1,020 papers shown
Title
SoundBeam meets M2D: Target Sound Extraction with Audio Foundation Model
Carlos Hernandez-Olivan
Marc Delcroix
Tsubasa Ochiai
Daisuke Niizumi
Naohiro Tawara
Tomohiro Nakatani
Shoko Araki
21
2
0
19 Sep 2024
Low Frame-rate Speech Codec: a Codec Designed for Fast High-quality Speech LLM Training and Inference
Edresson Casanova
Ryan Langman
Paarth Neekhara
Shehzeen Samarah Hussain
Jason Chun Lok Li
Subhankar Ghosh
Ante Jukić
Sang-gil Lee
AuLLM
29
2
0
18 Sep 2024
Mixture of Experts Fusion for Fake Audio Detection Using Frozen wav2vec 2.0
Zhiyong Wang
Ruibo Fu
Zhengqi Wen
Jianhua Tao
Xiaopeng Wang
...
Yi Lu
Yukun Liu
Chenxing Li
Xuefei Liu
Guanjun Li
19
6
0
18 Sep 2024
Simulating Native Speaker Shadowing for Nonnative Speech Assessment with Latent Speech Representations
Haopeng Geng
Daisuke Saito
Nobuaki Minematsu
19
0
0
18 Sep 2024
Large Language Models are Strong Audio-Visual Speech Recognition Learners
Umberto Cappellazzo
Minsu Kim
Honglie Chen
Pingchuan Ma
Stavros Petridis
Daniele Falavigna
Alessio Brutti
Maja Pantic
25
9
0
18 Sep 2024
Discrete Unit based Masking for Improving Disentanglement in Voice Conversion
Philip H. Lee
Ismail Rasim Ulgen
Berrak Sisman
16
0
0
17 Sep 2024
M-BEST-RQ: A Multi-Channel Speech Foundation Model for Smart Glasses
Yufeng Yang
Desh Raj
Ju Lin
Niko Moritz
J. Jia
...
Egor Lakomkin
Yiteng Huang
Jacob Donley
Jay Mahadeokar
Ozlem Kalinli
9
2
0
17 Sep 2024
SpMis: An Investigation of Synthetic Spoken Misinformation Detection
Peizhuo Liu
Li Wang
Renqiang He
Haorui He
Lei Wang
Huadi Zheng
Jie Shi
Tong Xiao
Zhizheng Wu
21
1
0
17 Sep 2024
Enhancing Multilingual Speech Generation and Recognition Abilities in LLMs with Constructed Code-switched Data
Jing Xu
Daxin Tan
Jiaqi Wang
Xiao Chen
19
0
0
17 Sep 2024
Exploring Prediction Targets in Masked Pre-Training for Speech Foundation Models
Li-Wei Chen
Takuya Higuchi
He Bai
Ahmed Hussen Abdelaziz
Alexander Rudnicky
Shinji Watanabe
Tatiana Likhomanenko
B. Theobald
Zakaria Aldeneh
36
0
0
16 Sep 2024
Towards Automatic Assessment of Self-Supervised Speech Models using Rank
Zakaria Aldeneh
Vimal Thilak
Takuya Higuchi
B. Theobald
Tatiana Likhomanenko
SSL
67
0
0
16 Sep 2024
Stimulus Modality Matters: Impact of Perceptual Evaluations from Different Modalities on Speech Emotion Recognition System Performance
Huang-Cheng Chou
Haibin Wu
Chi-Chun Lee
22
0
0
16 Sep 2024
Self-supervised Speech Models for Word-Level Stuttered Speech Detection
Yi-Jen Shih
Zoi Gkalitsiou
A. Dimakis
David Harwath
21
1
0
16 Sep 2024
Speech as a Biomarker for Disease Detection
Catarina Botelho
A. Abad
Tanja Schultz
Isabel Trancoso
26
1
0
16 Sep 2024
Improving Spoken Language Modeling with Phoneme Classification: A Simple Fine-tuning Approach
Maxime Poli
Emmanuel Chemla
Emmanuel Dupoux
21
2
0
16 Sep 2024
StyleTTS-ZS: Efficient High-Quality Zero-Shot Text-to-Speech Synthesis with Distilled Time-Varying Style Diffusion
Yinghao Aaron Li
Xilin Jiang
Cong Han
N. Mesgarani
DiffM
24
3
0
16 Sep 2024
Explaining Deep Learning Embeddings for Speech Emotion Recognition by Predicting Interpretable Acoustic Features
Satvik Dixit
Daniel M. Low
Gasser Elbanna
Fabio Catania
Satrajit S. Ghosh
21
0
0
14 Sep 2024
ESPnet-EZ: Python-only ESPnet for Easy Fine-tuning and Integration
Masao Someki
Kwanghee Choi
Siddhant Arora
William Chen
Samuele Cornell
Jionghao Han
Yifan Peng
Jiatong Shi
Vaibhav Srivastav
Shinji Watanabe
VLM
25
0
0
14 Sep 2024
Leveraging Self-Supervised Learning for Speaker Diarization
Jiangyu Han
Federico Landini
Johan Rohdin
Anna Silnova
Mireia Díez
Lukas Burget
28
1
0
14 Sep 2024
Integrated Multi-Level Knowledge Distillation for Enhanced Speaker Verification
Wenhao Yang
Jianguo Wei
Wenhuan Lu
Xugang Lu
Lei Li
15
0
0
14 Sep 2024
Joint Semantic Knowledge Distillation and Masked Acoustic Modeling for Full-band Speech Restoration with Improved Intelligibility
Xiaoyu Liu
Xu Li
Joan Serra
Santiago Pascual
11
3
0
14 Sep 2024
HLTCOE JHU Submission to the Voice Privacy Challenge 2024
Henry Li Xinyuan
Zexin Cai
Ashi Garg
Kevin Duh
Leibny Paola García-Perera
Sanjeev Khudanpur
Nicholas Andrews
Matthew Wiesner
14
3
0
13 Sep 2024
Exploring SSL Discrete Tokens for Multilingual ASR
Mingyu Cui
Daxin Tan
Yifan Yang
Dingdong Wang
Huimeng Wang
Xiao Chen
Xie Chen
Xunying Liu
23
1
0
13 Sep 2024
Exploring SSL Discrete Speech Features for Zipformer-based Contextual ASR
Mingyu Cui
Yifan Yang
Jiajun Deng
Jiawen Kang
Shujie Hu
Tianzi Wang
Zhaoqing Li
Shiliang Zhang
Xie Chen
Xunying Liu
18
1
0
13 Sep 2024
NEST-RQ: Next Token Prediction for Speech Self-Supervised Pre-Training
Minglun Han
Ye Bai
Chen Shen
Youjia Huang
Mingkun Huang
Zehua Lin
Linhao Dong
Lu Lu
Yuxuan Wang
32
0
0
13 Sep 2024
Unified Audio Event Detection
Yidi Jiang
Ruijie Tao
Wen Huang
Qian Chen
Wen Wang
25
0
0
13 Sep 2024
Large Language Model Can Transcribe Speech in Multi-Talker Scenarios with Versatile Instructions
Lingwei Meng
Shujie Hu
Jiawen Kang
Zhaoqing Li
Yuejiao Wang
Wenxuan Wu
Xixin Wu
Xunying Liu
Helen Meng
AuLLM
62
1
0
13 Sep 2024
Towards Quantifying and Reducing Language Mismatch Effects in Cross-Lingual Speech Anti-Spoofing
Tianchi Liu
Ivan Kukanov
Zihan Pan
Qiongqiong Wang
Hardik B. Sailor
K. Lee
21
0
0
12 Sep 2024
Zero-Shot Sing Voice Conversion: built upon clustering-based phoneme representations
Wangjin Zhou
Fengrun Zhang
Yiming Liu
Wenhao Guan
Yi Zhao
He Qu
18
0
0
12 Sep 2024
TSELM: Target Speaker Extraction using Discrete Tokens and Language Models
Beilong Tang
Bang Zeng
Ming Li
23
2
0
12 Sep 2024
Universal Pooling Method of Multi-layer Features from Pretrained Models for Speaker Verification
Jin Sob Kim
Hyun Joon Park
Wooseok Shin
Sung Won Han
SLR
27
0
0
12 Sep 2024
Muskits-ESPnet: A Comprehensive Toolkit for Singing Voice Synthesis in New Paradigm
Yuning Wu
Jiatong Shi
Yifeng Yu
Yuxun Tang
Tao Qian
Yueqian Lin
Jionghao Han
Xinyi Bai
Shinji Watanabe
Qin Jin
18
3
0
11 Sep 2024
The VoiceMOS Challenge 2024: Beyond Speech Quality Prediction
Wen-Chin Huang
Szu-Wei Fu
Erica Cooper
Ryandhimas E. Zezario
T. Toda
Hsin-Min Wang
Junichi Yamagishi
Yu Tsao
19
5
0
11 Sep 2024
Sortformer: Seamless Integration of Speaker Diarization and ASR by Bridging Timestamps and Tokens
Taejin Park
Ivan Medennikov
Kunal Dhawan
Weiqing Wang
He Huang
Nithin Rao Koluguri
Krishna C. Puvvada
Jagadeesh Balam
Boris Ginsburg
19
2
0
10 Sep 2024
SpeechTaxi: On Multilingual Semantic Speech Classification
Lennart Keller
Goran Glavaš
23
0
0
10 Sep 2024
MoWE-Audio: Multitask AudioLLMs with Mixture of Weak Encoders
W. Zhang
Shuo Sun
Bin Wang
Xunlong Zou
Zhuohan Liu
Yingxu He
Geyu Lin
Nancy F. Chen
A. Aw
AuLLM
65
1
0
10 Sep 2024
Estimating the Completeness of Discrete Speech Units
Sung-Lin Yeh
Hao Tang
15
1
0
09 Sep 2024
Continuous Learning of Transformer-based Audio Deepfake Detection
Tuan Duy Nguyen Le
Kah Kuan Teh
Huy Dat Tran
ViT
18
2
0
09 Sep 2024
Efficient Training of Self-Supervised Speech Foundation Models on a Compute Budget
Andy T. Liu
Yi-Cheng Lin
Haibin Wu
Stefan Winkler
Hung-yi Lee
22
1
0
09 Sep 2024
Mamba-Enhanced Text-Audio-Video Alignment Network for Emotion Recognition in Conversations
Xinran Li
Xiaomao Fan
Q. Wu
Xiaojiang Peng
Y. Li
Mamba
19
1
0
08 Sep 2024
Just ASR + LLM? A Study on Speech Large Language Models' Ability to Identify and Understand Speaker in Spoken Dialogue
Junkai Wu
Xulin Fan
Bo-Ru Lu
Xilin Jiang
N. Mesgarani
M. Hasegawa-Johnson
Mari Ostendorf
AuLLM
ELM
56
0
0
07 Sep 2024
Property Neurons in Self-Supervised Speech Transformers
T. Lin
Guan-Ting Lin
Hung-yi Lee
Hao Tang
MILM
20
0
0
07 Sep 2024
Investigating Neural Audio Codecs for Speech Language Model-Based Speech Generation
Jiaqi Li
Dongmei Wang
Xiaofei Wang
Yao Qian
Long Zhou
...
Junkun Chen
Sheng Zhao
Jinyu Li
Zhizheng Wu
Michael Zeng
AuLLM
17
2
0
06 Sep 2024
Privacy versus Emotion Preservation Trade-offs in Emotion-Preserving Speaker Anonymization
Zexin Cai
Henry Li Xinyuan
Ashi Garg
Leibny Paola García-Perera
Kevin Duh
Sanjeev Khudanpur
Nicholas Andrews
Matthew Wiesner
16
0
0
05 Sep 2024
STAB: Speech Tokenizer Assessment Benchmark
Shikhar Vashishth
Harman Singh
Shikhar Bharadwaj
Sriram Ganapathy
Chulayuth Asawaroengchai
Kartik Audhkhasi
Andrew Rosenberg
Ankur Bapna
Bhuvana Ramabhadran
43
0
0
04 Sep 2024
The USTC-NERCSLIP Systems for the CHiME-8 NOTSOFAR-1 Challenge
Shutong Niu
Ruoyu Wang
Jun Du
Gaobin Yang
Yanhui Tu
...
Tian Gao
Genshun Wan
Feng Ma
Jia Pan
Jianqing Gao
21
4
0
03 Sep 2024
vec2wav 2.0: Advancing Voice Conversion via Discrete Token Vocoders
Yiwei Guo
Zhihan Li
Junjie Li
Chenpeng Du
Hankun Wang
Shuai Wang
Xie Chen
Kai Yu
19
0
0
03 Sep 2024
SoCodec: A Semantic-Ordered Multi-Stream Speech Codec for Efficient Language Model Based Text-to-Speech Synthesis
Haohan Guo
Fenglong Xie
Kun Xie
Dongchao Yang
Dake Guo
Xixin Wu
Helen Meng
21
4
0
02 Sep 2024
Serialized Speech Information Guidance with Overlapped Encoding Separation for Multi-Speaker Automatic Speech Recognition
Hao Shi
Yuan Gao
Zhaoheng Ni
Tatsuya Kawahara
21
1
0
01 Sep 2024
MaskGCT: Zero-Shot Text-to-Speech with Masked Generative Codec Transformer
Yuancheng Wang
Haoyue Zhan
Liwei Liu
Ruihong Zeng
Haotian Guo
Jiachen Zheng
Qiang Zhang
Shunsi Zhang
Shunsi Zhang
Zhizheng Wu
23
37
0
01 Sep 2024
Previous
1
2
3
4
5
6
...
19
20
21
Next