ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2110.13900
  4. Cited By
WavLM: Large-Scale Self-Supervised Pre-Training for Full Stack Speech
  Processing

WavLM: Large-Scale Self-Supervised Pre-Training for Full Stack Speech Processing

26 October 2021
Sanyuan Chen
Chengyi Wang
Zhengyang Chen
Yu-Huan Wu
Shujie Liu
Zhuo Chen
Jinyu Li
Naoyuki Kanda
Takuya Yoshioka
Xiong Xiao
Jian Wu
Long Zhou
Shuo Ren
Y. Qian
Yao Qian
Jian Wu
Micheal Zeng
Xiangzhan Yu
Furu Wei
    SSL
ArXivPDFHTML

Papers citing "WavLM: Large-Scale Self-Supervised Pre-Training for Full Stack Speech Processing"

50 / 1,019 papers shown
Title
Temporal-Frequency State Space Duality: An Efficient Paradigm for Speech
  Emotion Recognition
Temporal-Frequency State Space Duality: An Efficient Paradigm for Speech Emotion Recognition
Jiaqi Zhao
Fei Wang
Kun Li
Yanyan Wei
Shengeng Tang
Shu Zhao
Xiao Sun
Mamba
91
2
0
22 Dec 2024
Autoregressive Speech Synthesis with Next-Distribution Prediction
Autoregressive Speech Synthesis with Next-Distribution Prediction
Xinfa Zhu
WenJie Tian
Lei Xie
VLM
165
4
0
22 Dec 2024
LAMA-UT: Language Agnostic Multilingual ASR through Orthography
  Unification and Language-Specific Transliteration
LAMA-UT: Language Agnostic Multilingual ASR through Orthography Unification and Language-Specific Transliteration
Sangmin Lee
Woo-Jin Chung Hong-Goo Kang
Hong-Goo Kang
65
0
0
19 Dec 2024
Speech-Forensics: Towards Comprehensive Synthetic Speech Dataset
  Establishment and Analysis
Speech-Forensics: Towards Comprehensive Synthetic Speech Dataset Establishment and Analysis
Zhoulin Ji
Chenhao Lin
Hang Wang
Chao Shen
94
0
0
12 Dec 2024
Investigating Acoustic-Textual Emotional Inconsistency Information for
  Automatic Depression Detection
Investigating Acoustic-Textual Emotional Inconsistency Information for Automatic Depression Detection
Rongfeng Su
Changqing Xu
Xinyi Wu
Feng Xu
Xie Chen
Lan Wangt
Nan Yan
29
0
0
09 Dec 2024
SQ-Whisper: Speaker-Querying based Whisper Model for Target-Speaker ASR
SQ-Whisper: Speaker-Querying based Whisper Model for Target-Speaker ASR
Pengcheng Guo
Xuankai Chang
Hang Lv
Shinji Watanabe
Lei Xie
54
0
0
07 Dec 2024
CA-SSLR: Condition-Aware Self-Supervised Learning Representation for
  Generalized Speech Processing
CA-SSLR: Condition-Aware Self-Supervised Learning Representation for Generalized Speech Processing
Yen-Ju Lu
Jing Liu
Thomas Thebaud
Laureano Moro Velázquez
Ariya Rastrow
Najim Dehak
Jesus Villalba
65
1
0
05 Dec 2024
FreeCodec: A disentangled neural speech codec with fewer tokens
FreeCodec: A disentangled neural speech codec with fewer tokens
Youqiang Zheng
Weiping Tu
Yueteng Kang
Jie Chen
Yike Zhang
Li Xiao
Yuhong Yang
Long Ma
62
1
0
02 Dec 2024
Scaling Transformers for Low-Bitrate High-Quality Speech Coding
Scaling Transformers for Low-Bitrate High-Quality Speech Coding
Julian Parker
Anton Smirnov
Jordi Pons
CJ Carr
Zack Zukowski
Zach Evans
Xubo Liu
68
9
0
29 Nov 2024
How to Learn a New Language? An Efficient Solution for Self-Supervised Learning Models Unseen Languages Adaption in Low-Resource Scenario
How to Learn a New Language? An Efficient Solution for Self-Supervised Learning Models Unseen Languages Adaption in Low-Resource Scenario
Shih-Heng Wang
Zih-Ching Chen
Jiatong Shi
Ming To Chuang
Guan-Ting Lin
Kuan Po Huang
David F. Harwath
Shang-Wen Li
Hung-yi Lee
70
1
0
27 Nov 2024
Fusion of Discrete Representations and Self-Augmented Representations
  for Multilingual Automatic Speech Recognition
Fusion of Discrete Representations and Self-Augmented Representations for Multilingual Automatic Speech Recognition
Shih-Heng Wang
Jiatong Shi
Chien-yu Huang
Shinji Watanabe
Hung-yi Lee
59
0
0
27 Nov 2024
Multi-Resolution Generative Modeling of Human Motion from Limited Data
Multi-Resolution Generative Modeling of Human Motion from Limited Data
David Eduardo Moreno-Villamarín
A. Hilsmann
Peter Eisert
DiffM
3DH
76
0
0
25 Nov 2024
SKQVC: One-Shot Voice Conversion by K-Means Quantization with
  Self-Supervised Speech Representations
SKQVC: One-Shot Voice Conversion by K-Means Quantization with Self-Supervised Speech Representations
Youngjun Sim
Jinsung Yoon
Young-Joo Suh
64
0
0
25 Nov 2024
Hard-Synth: Synthesizing Diverse Hard Samples for ASR using Zero-Shot
  TTS and LLM
Hard-Synth: Synthesizing Diverse Hard Samples for ASR using Zero-Shot TTS and LLM
Jiawei Yu
Y. Li
Xiaosong Qiao
Huan Zhao
Xiaofeng Zhao
Wei Tang
M. Zhang
Hao Yang
Jinsong Su
63
0
0
20 Nov 2024
An Investigation of Reprogramming for Cross-Language Adaptation in Speaker Verification Systems
Jingyu Li
Aemon Yat Fei Chiu
Tan Lee
54
0
0
18 Nov 2024
Robust AI-Synthesized Speech Detection Using Feature Decomposition
  Learning and Synthesizer Feature Augmentation
Robust AI-Synthesized Speech Detection Using Feature Decomposition Learning and Synthesizer Feature Augmentation
Kuiyuan Zhang
Zhongyun Hua
Yushu Zhang
Yifang Guo
Tao Xiang
21
0
0
14 Nov 2024
Investigating the Effectiveness of Explainability Methods in Parkinson's
  Detection from Speech
Investigating the Effectiveness of Explainability Methods in Parkinson's Detection from Speech
Eleonora Mancini
Francesco Paissan
Paolo Torroni
Mirco Ravanelli
Cem Subakan
42
0
0
12 Nov 2024
Mamba-based Decoder-Only Approach with Bidirectional Speech Modeling for
  Speech Recognition
Mamba-based Decoder-Only Approach with Bidirectional Speech Modeling for Speech Recognition
Yoshiki Masuyama
Koichi Miyazaki
Masato Murata
Mamba
28
0
0
11 Nov 2024
CTC-Assisted LLM-Based Contextual ASR
CTC-Assisted LLM-Based Contextual ASR
Guanrou Yang
Z. Ma
Zhifu Gao
Shiliang Zhang
Xie Chen
21
2
0
10 Nov 2024
Performance evaluation of SLAM-ASR: The Good, the Bad, the Ugly, and the Way Forward
Performance evaluation of SLAM-ASR: The Good, the Bad, the Ugly, and the Way Forward
Shashi Kumar
Iuliia Thorbecke
Sergio Burdisso
Esaú Villatoro-Tello
M. Errecalde
Kadri Hacioğlu
Pradeep Rangappa
P. Motlícek
A. Ganapathiraju
Andreas Stolcke
41
1
0
06 Nov 2024
MOS-Bench: Benchmarking Generalization Abilities of Subjective Speech
  Quality Assessment Models
MOS-Bench: Benchmarking Generalization Abilities of Subjective Speech Quality Assessment Models
Wen-Chin Huang
Erica Cooper
T. Toda
21
4
0
06 Nov 2024
Speech Separation with Pretrained Frontend to Minimize Domain Mismatch
Speech Separation with Pretrained Frontend to Minimize Domain Mismatch
Wupeng Wang
Zexu Pan
X. Li
Shuai Wang
H. Li
24
3
0
05 Nov 2024
Real-Time Scream Detection and Position Estimation for Worker Safety in
  Construction Sites
Real-Time Scream Detection and Position Estimation for Worker Safety in Construction Sites
Bikalpa Gautam
Anmol Guragain
Sarthak Giri
24
0
0
05 Nov 2024
EmoSphere++: Emotion-Controllable Zero-Shot Text-to-Speech via Emotion-Adaptive Spherical Vector
EmoSphere++: Emotion-Controllable Zero-Shot Text-to-Speech via Emotion-Adaptive Spherical Vector
Deok-Hyeon Cho
Hyung-Seok Oh
Seung-Bin Kim
Seong-Whan Lee
37
3
0
04 Nov 2024
DC-Spin: A Speaker-invariant Speech Tokenizer for Spoken Language Models
DC-Spin: A Speaker-invariant Speech Tokenizer for Spoken Language Models
Heng-Jui Chang
Hongyu Gong
Changhan Wang
James R. Glass
Yu-An Chung
26
0
0
31 Oct 2024
An Empirical Analysis of Speech Self-Supervised Learning at Multiple
  Resolutions
An Empirical Analysis of Speech Self-Supervised Learning at Multiple Resolutions
Theo Clark
Benedetta Cevoli
Eloy de Jong
Timofey Abramski
Jamie Dougherty
SSL
31
0
0
31 Oct 2024
Lina-Speech: Gated Linear Attention is a Fast and Parameter-Efficient
  Learner for text-to-speech synthesis
Lina-Speech: Gated Linear Attention is a Fast and Parameter-Efficient Learner for text-to-speech synthesis
Théodor Lemerle
Harrison Vanderbyl
Vaibhav Srivastav
Nicolas Obin
Axel Roebel
31
1
0
30 Oct 2024
Enhancing Lie Detection Accuracy: A Comparative Study of Classic ML,
  CNN, and GCN Models using Audio-Visual Features
Enhancing Lie Detection Accuracy: A Comparative Study of Classic ML, CNN, and GCN Models using Audio-Visual Features
Abdelrahman Abdelwahab
Abdelrahman Abdelwahab
Ayaan Vaswani
Advait Bharathulwar
Arnav Kommaraju
16
1
0
26 Oct 2024
Personality Analysis from Online Short Video Platforms with Multi-domain
  Adaptation
Personality Analysis from Online Short Video Platforms with Multi-domain Adaptation
Sixu An
X. Sun
Yicong Li
Yu Yang
Guandong Xu
26
0
0
26 Oct 2024
Deep Insights into Cognitive Decline: A Survey of Leveraging
  Non-Intrusive Modalities with Deep Learning Techniques
Deep Insights into Cognitive Decline: A Survey of Leveraging Non-Intrusive Modalities with Deep Learning Techniques
David Ortiz-Perez
Manuel Benavent-Lledo
José García Rodríguez
David Tomás
M. Flores Vizcaya-Moreno
16
0
0
24 Oct 2024
VoiceTextBlender: Augmenting Large Language Models with Speech Capabilities via Single-Stage Joint Speech-Text Supervised Fine-Tuning
VoiceTextBlender: Augmenting Large Language Models with Speech Capabilities via Single-Stage Joint Speech-Text Supervised Fine-Tuning
Yifan Peng
Krishna C. Puvvada
Zhehuai Chen
Piotr .Zelasko
He Huang
Kunal Dhawan
Ke Hu
Shinji Watanabe
Jagadeesh Balam
Boris Ginsburg
41
2
0
23 Oct 2024
Characterizing Robocalls with Multiple Vantage Points
Characterizing Robocalls with Multiple Vantage Points
Sathvik Prasad
Aleksandr Nahapetyan
Bradley Reaves
19
0
0
22 Oct 2024
Continuous Speech Tokenizer in Text To Speech
Continuous Speech Tokenizer in Text To Speech
Yixing Li
Ruobing Xie
X. Sun
Yu Cheng
Zhanhui Kang
AuLLM
CLL
31
2
0
22 Oct 2024
LSCodec: Low-Bitrate and Speaker-Decoupled Discrete Speech Codec
LSCodec: Low-Bitrate and Speaker-Decoupled Discrete Speech Codec
Yiwei Guo
Zhihan Li
Chenpeng Du
Hankun Wang
Xie Chen
Kai Yu
31
0
0
21 Oct 2024
Anonymising Elderly and Pathological Speech: Voice Conversion Using DDSP
  and Query-by-Example
Anonymising Elderly and Pathological Speech: Voice Conversion Using DDSP and Query-by-Example
Suhita Ghosh
Melanie Jouaiti
Arnab Das
Yamini Sinha
Tim Polzehl
Ingo Siegert
Sebastian Stober
18
2
0
20 Oct 2024
Improving Voice Quality in Speech Anonymization With Just
  Perception-Informed Losses
Improving Voice Quality in Speech Anonymization With Just Perception-Informed Losses
Suhita Ghosh
Tim Thiele
Frederic Lorbeer
Frank Dreyer
Sebastian Stober
25
0
0
20 Oct 2024
Improving Pronunciation and Accent Conversion through Knowledge Distillation And Synthetic Ground-Truth from Native TTS
Improving Pronunciation and Accent Conversion through Knowledge Distillation And Synthetic Ground-Truth from Native TTS
T. Nguyen
Seymanur Akti
Ngoc-Quan Pham
A. Waibel
18
0
0
19 Oct 2024
Enhancing Multimodal Sentiment Analysis for Missing Modality through Self-Distillation and Unified Modality Cross-Attention
Enhancing Multimodal Sentiment Analysis for Missing Modality through Self-Distillation and Unified Modality Cross-Attention
Yuzhe Weng
Haotian Wang
Tian Gao
Kewei Li
Shutong Niu
Jun Du
28
0
0
19 Oct 2024
AC-Mix: Self-Supervised Adaptation for Low-Resource Automatic Speech
  Recognition using Agnostic Contrastive Mixup
AC-Mix: Self-Supervised Adaptation for Low-Resource Automatic Speech Recognition using Agnostic Contrastive Mixup
Carlos Carvalho
A. Abad
11
0
0
18 Oct 2024
Optimal Transport Maps are Good Voice Converters
Optimal Transport Maps are Good Voice Converters
Arip Asadulaev
Rostislav Korst
V. Shutov
Alexander Korotin
Yaroslav Grebnyak
Vahe Egiazarian
E. Burnaev
OT
17
1
0
17 Oct 2024
STCON System for the CHiME-8 Challenge
STCON System for the CHiME-8 Challenge
Anton Mitrofanov
Tatiana Prisyach
Tatiana Timofeeva
Sergei Novoselov
M. Korenevsky
...
Dmitriy Miroshnichenko
Nikita Mamaev
Ilya Odegov
Olga Rudnitskaya
A. Romanenko
16
1
0
17 Oct 2024
On the Use of Audio to Improve Dialogue Policies
On the Use of Audio to Improve Dialogue Policies
Daniel Roncel
Federico Costa
Javier Hernando
19
0
0
17 Oct 2024
End-to-End Integration of Speech Emotion Recognition with Voice Activity
  Detection using Self-Supervised Learning Features
End-to-End Integration of Speech Emotion Recognition with Voice Activity Detection using Self-Supervised Learning Features
Natsuo Yamashita
Masaaki Yamamoto
Y. Kawaguchi
16
0
0
17 Oct 2024
EH-MAM: Easy-to-Hard Masked Acoustic Modeling for Self-Supervised Speech
  Representation Learning
EH-MAM: Easy-to-Hard Masked Acoustic Modeling for Self-Supervised Speech Representation Learning
Ashish Seth
Ramaneswaran Selvakumar
S. Sakshi
Sonal Kumar
Sreyan Ghosh
Dinesh Manocha
19
0
0
17 Oct 2024
Multi-View Multi-Task Modeling with Speech Foundation Models for Speech
  Forensic Tasks
Multi-View Multi-Task Modeling with Speech Foundation Models for Speech Forensic Tasks
Orchid Chetia Phukan
Devyani Koshal
Swarup Ranjan Behera
Arun Balaji Buduru
Rajesh Sharma
16
0
0
16 Oct 2024
SeQuiFi: Mitigating Catastrophic Forgetting in Speech Emotion
  Recognition with Sequential Class-Finetuning
SeQuiFi: Mitigating Catastrophic Forgetting in Speech Emotion Recognition with Sequential Class-Finetuning
Sarthak Jain
Orchid Chetia Phukan
Swarup Ranjan Behera
Arun Balaji Buduru
Rajesh Sharma
CLL
19
0
0
16 Oct 2024
Exploiting Longitudinal Speech Sessions via Voice Assistant Systems for
  Early Detection of Cognitive Decline
Exploiting Longitudinal Speech Sessions via Voice Assistant Systems for Early Detection of Cognitive Decline
Kristin Qi
Jiatong Shi
Caroline Summerour
J. Batsis
Xiaohui Liang
26
0
0
16 Oct 2024
Investigation of Speaker Representation for Target-Speaker Speech
  Processing
Investigation of Speaker Representation for Target-Speaker Speech Processing
Takanori Ashihara
Takafumi Moriya
Shota Horiguchi
Junyi Peng
Tsubasa Ochiai
Marc Delcroix
Kohei Matsuura
Hiroshi Sato
16
1
0
15 Oct 2024
JOOCI: a Framework for Learning Comprehensive Speech Representations
JOOCI: a Framework for Learning Comprehensive Speech Representations
Hemant Yadav
R. Shah
Sunayana Sitaram
11
0
0
14 Oct 2024
Beyond Fixed Topologies: Unregistered Training and Comprehensive Evaluation Metrics for 3D Talking Heads
Beyond Fixed Topologies: Unregistered Training and Comprehensive Evaluation Metrics for 3D Talking Heads
Federico Nocentini
T. Besnier
Claudio Ferrari
Sylvain Arguillere
Stefano Berretti
Mohamed Daoudi
53
1
0
14 Oct 2024
Previous
123456...192021
Next