ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2110.13900
  4. Cited By
WavLM: Large-Scale Self-Supervised Pre-Training for Full Stack Speech
  Processing

WavLM: Large-Scale Self-Supervised Pre-Training for Full Stack Speech Processing

26 October 2021
Sanyuan Chen
Chengyi Wang
Zhengyang Chen
Yu-Huan Wu
Shujie Liu
Zhuo Chen
Jinyu Li
Naoyuki Kanda
Takuya Yoshioka
Xiong Xiao
Jian Wu
Long Zhou
Shuo Ren
Y. Qian
Yao Qian
Jian Wu
Micheal Zeng
Xiangzhan Yu
Furu Wei
    SSL
ArXivPDFHTML

Papers citing "WavLM: Large-Scale Self-Supervised Pre-Training for Full Stack Speech Processing"

50 / 1,021 papers shown
Title
Explainable by-design Audio Segmentation through Non-Negative Matrix
  Factorization and Probing
Explainable by-design Audio Segmentation through Non-Negative Matrix Factorization and Probing
Martin Lebourdais
Théo Mariotte
Antonio Almudévar
Marie Tahon
Alfonso Ortega
28
0
0
19 Jun 2024
Articulatory Encodec: Coding Speech through Vocal Tract Kinematics
Articulatory Encodec: Coding Speech through Vocal Tract Kinematics
Cheol Jun Cho
Peter Wu
Tejas S. Prabhune
Dhruv Agarwal
Gopala K. Anumanchipalli
24
1
0
18 Jun 2024
Performant ASR Models for Medical Entities in Accented Speech
Performant ASR Models for Medical Entities in Accented Speech
Tejumade Afonja
Tobi Olatunji
Sewade Ogun
Naome A. Etori
A. Owodunni
Moshood Yekini
21
2
0
18 Jun 2024
Interface Design for Self-Supervised Speech Models
Interface Design for Self-Supervised Speech Models
Yi-Jen Shih
David Harwath
54
1
0
18 Jun 2024
A dual task learning approach to fine-tune a multilingual semantic
  speech encoder for Spoken Language Understanding
A dual task learning approach to fine-tune a multilingual semantic speech encoder for Spoken Language Understanding
G. Laperriere
Sahar Ghannay
Bassam Jabaian
Yannick Esteve
19
0
0
17 Jun 2024
AnoPatch: Towards Better Consistency in Machine Anomalous Sound
  Detection
AnoPatch: Towards Better Consistency in Machine Anomalous Sound Detection
Anbai Jiang
Bing Han
Zhiqiang Lv
Yufeng Deng
Wei-Qiang Zhang
Xie Chen
Yanmin Qian
Jia Liu
Pingyi Fan
19
3
0
17 Jun 2024
NAST: Noise Aware Speech Tokenization for Speech Language Models
NAST: Noise Aware Speech Tokenization for Speech Language Models
Shoval Messica
Yossi Adi
19
6
0
16 Jun 2024
Robust Channel Learning for Large-Scale Radio Speaker Verification
Robust Channel Learning for Large-Scale Radio Speaker Verification
Wenhao Yang
Jianguo Wei
Wenhuan Lu
Lei Li
Xugang Lu
35
2
0
16 Jun 2024
SingMOS: An extensive Open-Source Singing Voice Dataset for MOS
  Prediction
SingMOS: An extensive Open-Source Singing Voice Dataset for MOS Prediction
Yuxun Tang
Jiatong Shi
Yuning Wu
Qin Jin
21
8
0
16 Jun 2024
How Should We Extract Discrete Audio Tokens from Self-Supervised Models?
How Should We Extract Discrete Audio Tokens from Self-Supervised Models?
Pooneh Mousavi
J. Duret
Salah Zaiem
Luca Della Libera
Artem Ploujnikov
Cem Subakan
Mirco Ravanelli
29
9
0
15 Jun 2024
Double Multi-Head Attention Multimodal System for Odyssey 2024 Speech
  Emotion Recognition Challenge
Double Multi-Head Attention Multimodal System for Odyssey 2024 Speech Emotion Recognition Challenge
Federico Costa
Miquel India
Javier Hernando
26
2
0
15 Jun 2024
SOA: Reducing Domain Mismatch in SSL Pipeline by Speech Only Adaptation
  for Low Resource ASR
SOA: Reducing Domain Mismatch in SSL Pipeline by Speech Only Adaptation for Low Resource ASR
Natarajan Balaji Shankar
Ruchao Fan
Abeer Alwan
27
0
0
15 Jun 2024
Benchmarking Children's ASR with Supervised and Self-supervised Speech
  Foundation Models
Benchmarking Children's ASR with Supervised and Self-supervised Speech Foundation Models
Ruchao Fan
Natarajan Balaji Shankar
Abeer Alwan
19
7
0
15 Jun 2024
Enhancing Multilingual Voice Toxicity Detection with Speech-Text
  Alignment
Enhancing Multilingual Voice Toxicity Detection with Speech-Text Alignment
Joseph Liu
Mahesh Kumar Nandwana
Janne Pylkkönen
Hannes Heikinheimo
Morgan McGuire
27
0
0
14 Jun 2024
One-pass Multiple Conformer and Foundation Speech Systems Compression
  and Quantization Using An All-in-one Neural Model
One-pass Multiple Conformer and Foundation Speech Systems Compression and Quantization Using An All-in-one Neural Model
Zhaoqing Li
Haoning Xu
Tianzi Wang
Shoukang Hu
Zengrui Jin
Shujie Hu
Jiajun Deng
Mingyu Cui
Mengzhe Geng
Xunying Liu
MQ
16
1
0
14 Jun 2024
Joint Speaker Features Learning for Audio-visual Multichannel Speech
  Separation and Recognition
Joint Speaker Features Learning for Audio-visual Multichannel Speech Separation and Recognition
Guinan Li
Jiajun Deng
Youjun Chen
Mengzhe Geng
Shujie Hu
...
Zengrui Jin
Tianzi Wang
Xurong Xie
Helen Meng
Xunying Liu
VLM
21
0
0
14 Jun 2024
On the Evaluation of Speech Foundation Models for Spoken Language
  Understanding
On the Evaluation of Speech Foundation Models for Spoken Language Understanding
Siddhant Arora
Ankita Pasad
Chung-Ming Chien
Jionghao Han
Roshan S. Sharma
...
William Chen
Suwon Shon
Hung-yi Lee
Karen Livescu
Shinji Watanabe
ELM
35
4
0
14 Jun 2024
Simul-Whisper: Attention-Guided Streaming Whisper with Truncation
  Detection
Simul-Whisper: Attention-Guided Streaming Whisper with Truncation Detection
Haoyu Wang
Guoqiang Hu
Guodong Lin
Wei-Qiang Zhang
Jian Li
20
1
0
14 Jun 2024
Towards Effective and Efficient Non-autoregressive Decoding Using
  Block-based Attention Mask
Towards Effective and Efficient Non-autoregressive Decoding Using Block-based Attention Mask
Tianzi Wang
Xurong Xie
Zhaoqing Li
Shoukang Hu
Zengrui Jin
...
Shujie Hu
Mengzhe Geng
Guinan Li
Helen Meng
Xunying Liu
21
0
0
14 Jun 2024
MMM: Multi-Layer Multi-Residual Multi-Stream Discrete Speech
  Representation from Self-supervised Learning Model
MMM: Multi-Layer Multi-Residual Multi-Stream Discrete Speech Representation from Self-supervised Learning Model
Jiatong Shi
Xutai Ma
Hirofumi Inaguma
Anna Y. Sun
Shinji Watanabe
47
7
0
14 Jun 2024
On the Encoding of Gender in Transformer-based ASR Representations
On the Encoding of Gender in Transformer-based ASR Representations
Aravind Krishnan
Badr M. Abdullah
Dietrich Klakow
41
2
0
14 Jun 2024
Vec-Tok-VC+: Residual-enhanced Robust Zero-shot Voice Conversion with
  Progressive Constraints in a Dual-mode Training Strategy
Vec-Tok-VC+: Residual-enhanced Robust Zero-shot Voice Conversion with Progressive Constraints in a Dual-mode Training Strategy
Linhan Ma
Xinfa Zhu
Yuanjun Lv
Zhichao Wang
Ziqian Wang
Wendi He
Hongbin Zhou
Lei Xie
26
2
0
14 Jun 2024
DiscreteSLU: A Large Language Model with Self-Supervised Discrete Speech
  Units for Spoken Language Understanding
DiscreteSLU: A Large Language Model with Self-Supervised Discrete Speech Units for Spoken Language Understanding
Suwon Shon
Kwangyoun Kim
Yi-Te Hsu
Prashant Sridhar
Shinji Watanabe
Karen Livescu
AuLLM
39
2
0
13 Jun 2024
Orthogonality and isotropy of speaker and phonetic information in
  self-supervised speech representations
Orthogonality and isotropy of speaker and phonetic information in self-supervised speech representations
Mukhtar Mohamed
Oli Danyi Liu
Hao Tang
Sharon Goldwater
SSL
28
2
0
13 Jun 2024
LASER: Learning by Aligning Self-supervised Representations of Speech
  for Improving Content-related Tasks
LASER: Learning by Aligning Self-supervised Representations of Speech for Improving Content-related Tasks
Amit Meghanani
Thomas Hain
28
1
0
13 Jun 2024
ToneUnit: A Speech Discretization Approach for Tonal Language Speech
  Synthesis
ToneUnit: A Speech Discretization Approach for Tonal Language Speech Synthesis
Dehua Tao
Daxin Tan
Y. Yeung
Xiao Chen
Tan Lee
22
3
0
13 Jun 2024
Exploring Multilingual Unseen Speaker Emotion Recognition: Leveraging
  Co-Attention Cues in Multitask Learning
Exploring Multilingual Unseen Speaker Emotion Recognition: Leveraging Co-Attention Cues in Multitask Learning
Arnav Goel
Medha Hira
Anubha Gupta
14
0
0
13 Jun 2024
SingOMD: Singing Oriented Multi-resolution Discrete Representation
  Construction from Speech Models
SingOMD: Singing Oriented Multi-resolution Discrete Representation Construction from Speech Models
Yuxun Tang
Yuning Wu
Jiatong Shi
Qin Jin
47
5
0
13 Jun 2024
VISinger2+: End-to-End Singing Voice Synthesis Augmented by
  Self-Supervised Learning Representation
VISinger2+: End-to-End Singing Voice Synthesis Augmented by Self-Supervised Learning Representation
Yifeng Yu
Jiatong Shi
Yuning Wu
Shinji Watanabe
36
3
0
13 Jun 2024
Self-Supervised Speech Representations are More Phonetic than Semantic
Self-Supervised Speech Representations are More Phonetic than Semantic
Kwanghee Choi
Ankita Pasad
Tomohiko Nakamura
Satoru Fukayama
Karen Livescu
Shinji Watanabe
29
14
0
12 Jun 2024
SVSNet+: Enhancing Speaker Voice Similarity Assessment Models with
  Representations from Speech Foundation Models
SVSNet+: Enhancing Speaker Voice Similarity Assessment Models with Representations from Speech Foundation Models
Chun Yin
Tai-Shih Chi
Yu Tsao
Hsin-Min Wang
27
0
0
12 Jun 2024
TokSing: Singing Voice Synthesis based on Discrete Tokens
TokSing: Singing Voice Synthesis based on Discrete Tokens
Yuning Wu
Chunlei Zhang
Jiatong Shi
Yuxun Tang
Shan Yang
Qin Jin
19
6
0
12 Jun 2024
SCDNet: Self-supervised Learning Feature-based Speaker Change Detection
SCDNet: Self-supervised Learning Feature-based Speaker Change Detection
Yue Li
Xinsheng Wang
Li Zhang
Lei Xie
29
1
0
12 Jun 2024
DCASE 2024 Task 4: Sound Event Detection with Heterogeneous Data and
  Missing Labels
DCASE 2024 Task 4: Sound Event Detection with Heterogeneous Data and Missing Labels
Samuele Cornell
Janek Ebbers
Constance Douwes
Irene Martín-Morató
Manu Harju
A. Mesaros
Romain Serizel
19
13
0
12 Jun 2024
Attentive Merging of Hidden Embeddings from Pre-trained Speech Model for
  Anti-spoofing Detection
Attentive Merging of Hidden Embeddings from Pre-trained Speech Model for Anti-spoofing Detection
Zihan Pan
Tianchi Liu
Hardik B. Sailor
Qiongqiong Wang
40
10
0
12 Jun 2024
LibriTTS-P: A Corpus with Speaking Style and Speaker Identity Prompts
  for Text-to-Speech and Style Captioning
LibriTTS-P: A Corpus with Speaking Style and Speaker Identity Prompts for Text-to-Speech and Style Captioning
Masaya Kawamura
Ryuichi Yamamoto
Yuma Shirahata
Takuya Hasumi
Kentaro Tachibana
VLM
18
1
0
12 Jun 2024
Guiding Frame-Level CTC Alignments Using Self-knowledge Distillation
Guiding Frame-Level CTC Alignments Using Self-knowledge Distillation
Eungbeom Kim
Hantae Kim
Kyogu Lee
27
1
0
12 Jun 2024
Exploring Self-Supervised Multi-view Contrastive Learning for Speech
  Emotion Recognition with Limited Annotations
Exploring Self-Supervised Multi-view Contrastive Learning for Speech Emotion Recognition with Limited Annotations
Bulat Khaertdinov
Pedro Jeuris
Annanda Sousa
Enrique Hortal
17
1
0
12 Jun 2024
Exploring Speech Foundation Models for Speaker Diarization in
  Child-Adult Dyadic Interactions
Exploring Speech Foundation Models for Speaker Diarization in Child-Adult Dyadic Interactions
Anfeng Xu
Kevin Huang
Tiantian Feng
Lue Shen
Helen Tager-Flusberg
Shrikanth Narayanan
14
2
0
12 Jun 2024
Let's Go Real Talk: Spoken Dialogue Model for Face-to-Face Conversation
Let's Go Real Talk: Spoken Dialogue Model for Face-to-Face Conversation
Se Jin Park
Chae Won Kim
Hyeongseop Rha
Minsu Kim
Joanna Hong
Jeong Hun Yeo
Yong Man Ro
CVBM
AuLLM
37
6
0
12 Jun 2024
VALL-E R: Robust and Efficient Zero-Shot Text-to-Speech Synthesis via
  Monotonic Alignment
VALL-E R: Robust and Efficient Zero-Shot Text-to-Speech Synthesis via Monotonic Alignment
Bing Han
Long Zhou
Shujie Liu
Sanyuan Chen
Lingwei Meng
Yanming Qian
Yanqing Liu
Sheng Zhao
Jinyu Li
Furu Wei
25
13
0
12 Jun 2024
SE/BN Adapter: Parametric Efficient Domain Adaptation for Speaker
  Recognition
SE/BN Adapter: Parametric Efficient Domain Adaptation for Speaker Recognition
Tianhao Wang
Lantian Li
D. Wang
23
0
0
12 Jun 2024
GenDistiller: Distilling Pre-trained Language Models based on an
  Autoregressive Generative Model
GenDistiller: Distilling Pre-trained Language Models based on an Autoregressive Generative Model
Yingying Gao
Shilei Zhang
Chao Deng
Junlan Feng
17
0
0
12 Jun 2024
The Interspeech 2024 Challenge on Speech Processing Using Discrete Units
The Interspeech 2024 Challenge on Speech Processing Using Discrete Units
Xuankai Chang
Jiatong Shi
Jinchuan Tian
Yuning Wu
Yuxun Tang
Yihan Wu
Shinji Watanabe
Yossi Adi
Xie Chen
Qin Jin
40
15
0
11 Jun 2024
Sustainable self-supervised learning for speech representations
Sustainable self-supervised learning for speech representations
Luis Lugo
Valentin Vielzeuf
29
2
0
11 Jun 2024
Noise-Robust Voice Conversion by Conditional Denoising Training Using
  Latent Variables of Recording Quality and Environment
Noise-Robust Voice Conversion by Conditional Denoising Training Using Latent Variables of Recording Quality and Environment
Takuto Igarashi
Yuki Saito
Kentaro Seki
Shinnosuke Takamichi
Ryuichi Yamamoto
Kentaro Tachibana
Hiroshi Saruwatari
19
1
0
11 Jun 2024
EmoBox: Multilingual Multi-corpus Speech Emotion Recognition Toolkit and
  Benchmark
EmoBox: Multilingual Multi-corpus Speech Emotion Recognition Toolkit and Benchmark
Ziyang Ma
Mingjie Chen
Hezhao Zhang
Zhisheng Zheng
Wenxi Chen
Xiquan Li
Jiaxin Ye
Xie Chen
Thomas Hain
25
12
0
11 Jun 2024
The Reasonable Effectiveness of Speaker Embeddings for Violence
  Detection
The Reasonable Effectiveness of Speaker Embeddings for Violence Detection
Sarthak Jain
Orchid Chetia Phukan
Arun Balaji Buduru
Rajesh Sharma
14
0
0
10 Jun 2024
PERSONA: An Application for Emotion Recognition, Gender Recognition and
  Age Estimation
PERSONA: An Application for Emotion Recognition, Gender Recognition and Age Estimation
Devyani Koshal
Orchid Chetia Phukan
Sarthak Jain
Arun Balaji Buduru
Rajesh Sharma
LLMAG
16
0
0
10 Jun 2024
mHuBERT-147: A Compact Multilingual HuBERT Model
mHuBERT-147: A Compact Multilingual HuBERT Model
Marcely Zanon Boito
Vivek Iyer
Nikolaos Lagos
Laurent Besacier
Ioan Calapodescu
VLM
59
8
0
10 Jun 2024
Previous
123...789...192021
Next