ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2110.13900
  4. Cited By
WavLM: Large-Scale Self-Supervised Pre-Training for Full Stack Speech
  Processing

WavLM: Large-Scale Self-Supervised Pre-Training for Full Stack Speech Processing

26 October 2021
Sanyuan Chen
Chengyi Wang
Zhengyang Chen
Yu-Huan Wu
Shujie Liu
Zhuo Chen
Jinyu Li
Naoyuki Kanda
Takuya Yoshioka
Xiong Xiao
Jian Wu
Long Zhou
Shuo Ren
Y. Qian
Yao Qian
Jian Wu
Micheal Zeng
Xiangzhan Yu
Furu Wei
    SSL
ArXivPDFHTML

Papers citing "WavLM: Large-Scale Self-Supervised Pre-Training for Full Stack Speech Processing"

50 / 1,020 papers shown
Title
Progressive Residual Extraction based Pre-training for Speech
  Representation Learning
Progressive Residual Extraction based Pre-training for Speech Representation Learning
Tianrui Wang
Jin Li
Ziyang Ma
Rui Cao
Xie Chen
...
Meng Ge
Xiaobao Wang
Yuguang Wang
Jianwu Dang
Nyima Tashi
SSL
30
0
0
31 Aug 2024
Advancing Multi-talker ASR Performance with Large Language Models
Advancing Multi-talker ASR Performance with Large Language Models
Mohan Shi
Zengrui Jin
Yaoxun Xu
Yong Xu
Shi-Xiong Zhang
Kun Wei
Yiwen Shao
Chunlei Zhang
Dong Yu
20
0
0
30 Aug 2024
Utilizing Speaker Profiles for Impersonation Audio Detection
Utilizing Speaker Profiles for Impersonation Audio Detection
Hao Gu
JiangYan Yi
Chenglong Wang
Yong Ren
Jianhua Tao
Xinrui Yan
Yujie Chen
Xiaohui Zhang
25
0
0
30 Aug 2024
SelectTTS: Synthesizing Anyone's Voice via Discrete Unit-Based Frame Selection
SelectTTS: Synthesizing Anyone's Voice via Discrete Unit-Based Frame Selection
Ismail Rasim Ulgen
Shreeram Suresh Chandra
Junchen Lu
Berrak Sisman
46
0
0
30 Aug 2024
Audio xLSTMs: Learning Self-Supervised Audio Representations with xLSTMs
Audio xLSTMs: Learning Self-Supervised Audio Representations with xLSTMs
Sarthak Yadav
Sergios Theodoridis
Z. Tan
30
2
0
29 Aug 2024
WHISMA: A Speech-LLM to Perform Zero-shot Spoken Language Understanding
WHISMA: A Speech-LLM to Perform Zero-shot Spoken Language Understanding
Mohan Li
Cong-Thanh Do
Simon Keizer
Youmna Farag
Svetlana Stoyanchev
R. Doddipatla
22
2
0
29 Aug 2024
Enabling Beam Search for Language Model-Based Text-to-Speech Synthesis
Enabling Beam Search for Language Model-Based Text-to-Speech Synthesis
Zehai Tu
Guangyan Zhang
Yiting Lu
Adaeze Adigwe
Simon King
Yiwen Guo
14
0
0
29 Aug 2024
SSDM: Scalable Speech Dysfluency Modeling
SSDM: Scalable Speech Dysfluency Modeling
Jiachen Lian
Xuanru Zhou
Z. Ezzes
Jet M J Vonk
Brittany Morin
D. Baquirin
Zachary Mille
M. G. Tempini
Gopala Anumanchipalli
AuLLM
30
1
0
29 Aug 2024
WavTokenizer: an Efficient Acoustic Discrete Codec Tokenizer for Audio Language Modeling
WavTokenizer: an Efficient Acoustic Discrete Codec Tokenizer for Audio Language Modeling
Shengpeng Ji
Ziyue Jiang
Xize Cheng
Yifu Chen
Minghui Fang
...
Rongjie Huang
Yidi Jiang
Qian Chen
Zhou Zhao
Zhou Zhao
VLM
45
32
0
29 Aug 2024
Spoofing-Robust Speaker Verification Using Parallel Embedding Fusion:
  BTU Speech Group's Approach for ASVspoof5 Challenge
Spoofing-Robust Speaker Verification Using Parallel Embedding Fusion: BTU Speech Group's Approach for ASVspoof5 Challenge
Oğuzhan Kurnaz
Selim Can Demirtaş
Aykut Buker
Jagabandhu Mishra
Cemal Hanilçi
26
0
0
28 Aug 2024
Feature Representations for Automatic Meerkat Vocalization
  Classification
Feature Representations for Automatic Meerkat Vocalization Classification
Imen Ben Mahmoud
Eklavya Sarkar
Marta Manser
Mathew Magimai. -Doss
20
1
0
27 Aug 2024
The VoxCeleb Speaker Recognition Challenge: A Retrospective
The VoxCeleb Speaker Recognition Challenge: A Retrospective
Jaesung Huh
Joon Son Chung
Arsha Nagrani
A. Brown
Jee-weon Jung
Daniel Garcia-Romero
Andrew Zisserman
23
3
0
27 Aug 2024
Toward Improving Synthetic Audio Spoofing Detection Robustness via
  Meta-Learning and Disentangled Training With Adversarial Examples
Toward Improving Synthetic Audio Spoofing Detection Robustness via Meta-Learning and Disentangled Training With Adversarial Examples
Zhenyu Wang
John H. L. Hansen
AAML
28
1
0
23 Aug 2024
NEST: Self-supervised Fast Conformer as All-purpose Seasoning to Speech
  Processing Tasks
NEST: Self-supervised Fast Conformer as All-purpose Seasoning to Speech Processing Tasks
He Huang
Taejin Park
Kunal Dhawan
Ivan Medennikov
Krishna C. Puvvada
Nithin Rao Koluguri
Weiqing Wang
Jagadeesh Balam
Boris Ginsburg
SSL
AI4TS
14
1
0
23 Aug 2024
Improvement Speaker Similarity for Zero-Shot Any-to-Any Voice Conversion
  of Whispered and Regular Speech
Improvement Speaker Similarity for Zero-Shot Any-to-Any Voice Conversion of Whispered and Regular Speech
Anastasia Avdeeva
Aleksei Gusev
17
0
0
21 Aug 2024
BUT Systems and Analyses for the ASVspoof 5 Challenge
BUT Systems and Analyses for the ASVspoof 5 Challenge
Johan Rohdin
Lin Zhang
Oldřich Plchot
Vojtěch Staněk
David Mihola
...
Themos Stafylakis
Dmitriy Beveraki
Anna Silnova
Jan Brukner
Lukáš Burget
28
1
0
20 Aug 2024
A Noval Feature via Color Quantisation for Fake Audio Detection
A Noval Feature via Color Quantisation for Fake Audio Detection
Zhiyong Wang
Xiaopeng Wang
Yuankun Xie
Ruibo Fu
Zhengqi Wen
...
Guanjun Li
Xin Qi
Yi Lu
Xuefei Liu
Yongwei Li
26
0
0
20 Aug 2024
Speech Representation Learning Revisited: The Necessity of Separate Learnable Parameters and Robust Data Augmentation
Speech Representation Learning Revisited: The Necessity of Separate Learnable Parameters and Robust Data Augmentation
Hemant Yadav
Sunayana Sitaram
R. Shah
SSL
40
0
0
20 Aug 2024
ASASVIcomtech: The Vicomtech-UGR Speech Deepfake Detection and SASV
  Systems for the ASVspoof5 Challenge
ASASVIcomtech: The Vicomtech-UGR Speech Deepfake Detection and SASV Systems for the ASVspoof5 Challenge
Juan M. Martín-Donas
Eros Roselló
A. Gómez
Aitor Álvarez
Iván López-Espejo
Antonio M. Peinado
28
0
0
19 Aug 2024
Convert and Speak: Zero-shot Accent Conversion with Minimum Supervision
Convert and Speak: Zero-shot Accent Conversion with Minimum Supervision
Zhijun Jia
Huaying Xue
Xiulian Peng
Yan Lu
11
1
0
19 Aug 2024
SZU-AFS Antispoofing System for the ASVspoof 5 Challenge
SZU-AFS Antispoofing System for the ASVspoof 5 Challenge
Yuxiong Xu
Jiafeng Zhong
Sengui Zheng
Zefeng Liu
Bin Li
29
2
0
19 Aug 2024
Enhancing Modal Fusion by Alignment and Label Matching for Multimodal
  Emotion Recognition
Enhancing Modal Fusion by Alignment and Label Matching for Multimodal Emotion Recognition
Qifei Li
Yingming Gao
Yuhua Wen
Cong Wang
Ya Li
17
0
0
18 Aug 2024
Generating Data with Text-to-Speech and Large-Language Models for
  Conversational Speech Recognition
Generating Data with Text-to-Speech and Large-Language Models for Conversational Speech Recognition
Samuele Cornell
Jordan Darefsky
Zhiyao Duan
Shinji Watanabe
SyDa
53
4
0
17 Aug 2024
Convexity-based Pruning of Speech Representation Models
Convexity-based Pruning of Speech Representation Models
Teresa Dorszewski
Lenka Tětková
Lars Kai Hansen
12
2
0
16 Aug 2024
Cross-Modal Denoising: A Novel Training Paradigm for Enhancing
  Speech-Image Retrieval
Cross-Modal Denoising: A Novel Training Paradigm for Enhancing Speech-Image Retrieval
Lifeng Zhou
Yuke Li
Rui Deng
Yuting Yang
Haoqi Zhu
21
0
0
15 Aug 2024
SER Evals: In-domain and Out-of-domain Benchmarking for Speech Emotion
  Recognition
SER Evals: In-domain and Out-of-domain Benchmarking for Speech Emotion Recognition
Mohamed Osman
Daniel Z. Kaplan
Tamer Nadeem
19
1
0
14 Aug 2024
CMU's IWSLT 2024 Simultaneous Speech Translation System
CMU's IWSLT 2024 Simultaneous Speech Translation System
Xi Xu
Siqi Ouyang
Brian Yan
Patrick Fernandes
William Chen
Lei Li
Graham Neubig
Shinji Watanabe
21
1
0
14 Aug 2024
Temporal Variability and Multi-Viewed Self-Supervised Representations to
  Tackle the ASVspoof5 Deepfake Challenge
Temporal Variability and Multi-Viewed Self-Supervised Representations to Tackle the ASVspoof5 Deepfake Challenge
Yuankun Xie
Xiaopeng Wang
Zhiyong Wang
Ruibo Fu
Zhengqi Wen
Haonan Cheng
Long Ye
27
1
0
13 Aug 2024
Heterogeneous Space Fusion and Dual-Dimension Attention: A New Paradigm
  for Speech Enhancement
Heterogeneous Space Fusion and Dual-Dimension Attention: A New Paradigm for Speech Enhancement
Tao Zheng
Liejun Wang
Yinfeng Yu
18
1
0
13 Aug 2024
BSS-CFFMA: Cross-Domain Feature Fusion and Multi-Attention Speech
  Enhancement Network based on Self-Supervised Embedding
BSS-CFFMA: Cross-Domain Feature Fusion and Multi-Attention Speech Enhancement Network based on Self-Supervised Embedding
Alimjan Mattursun
Liejun Wang
Yinfeng Yu
20
2
0
13 Aug 2024
Enhancing Dialogue Speech Recognition with Robust Contextual Awareness
  via Noise Representation Learning
Enhancing Dialogue Speech Recognition with Robust Contextual Awareness via Noise Representation Learning
Wonjun Lee
San Kim
Gary Geunbae Lee
22
0
0
12 Aug 2024
VQ-CTAP: Cross-Modal Fine-Grained Sequence Representation Learning for
  Speech Processing
VQ-CTAP: Cross-Modal Fine-Grained Sequence Representation Learning for Speech Processing
Chunyu Qiang
Wang Geng
Yi Zhao
Ruibo Fu
Tao Wang
...
Chen Zhang
Hao Che
Longbiao Wang
Jianwu Dang
Jianhua Tao
AI4TS
31
0
0
11 Aug 2024
Exploiting Consistency-Preserving Loss and Perceptual Contrast
  Stretching to Boost SSL-based Speech Enhancement
Exploiting Consistency-Preserving Loss and Perceptual Contrast Stretching to Boost SSL-based Speech Enhancement
Muhammad Salman Khan
Moreno La Quatra
Kuo-Hsuan Hung
Szu-Wei Fu
Sabato Marco Siniscalchi
Yu Tsao
16
2
0
08 Aug 2024
Survey: Transformer-based Models in Data Modality Conversion
Survey: Transformer-based Models in Data Modality Conversion
Elyas Rashno
Amir Eskandari
Aman Anand
F. Zulkernine
MedIm
25
0
0
08 Aug 2024
MulliVC: Multi-lingual Voice Conversion With Cycle Consistency
MulliVC: Multi-lingual Voice Conversion With Cycle Consistency
Jiawei Huang
Chen Zhang
Yi Ren
Ziyue Jiang
Zhenhui Ye
Jinglin Liu
Jinzheng He
Xiang Yin
Zhou Zhao
22
2
0
08 Aug 2024
MDT-A2G: Exploring Masked Diffusion Transformers for Co-Speech Gesture
  Generation
MDT-A2G: Exploring Masked Diffusion Transformers for Co-Speech Gesture Generation
Xiaofeng Mao
Zhengkai Jiang
Qilin Wang
Chencan Fu
Jiangning Zhang
Jiafu Wu
Yabiao Wang
Chengjie Wang
Wei Li
Mingmin Chi
70
4
0
06 Aug 2024
Automatic Voice Identification after Speech Resynthesis using PPG
Automatic Voice Identification after Speech Resynthesis using PPG
Thibault Gaudier
Marie Tahon
Anthony Larcher
Yannick Esteve
32
0
0
05 Aug 2024
UniTalker: Scaling up Audio-Driven 3D Facial Animation through A Unified
  Model
UniTalker: Scaling up Audio-Driven 3D Facial Animation through A Unified Model
Xiangyu Fan
Jiaqi Li
Zhiqian Lin
Weiye Xiao
Lei Yang
CVBM
VGen
26
0
0
01 Aug 2024
DiM-Gesture: Co-Speech Gesture Generation with Adaptive Layer
  Normalization Mamba-2 framework
DiM-Gesture: Co-Speech Gesture Generation with Adaptive Layer Normalization Mamba-2 framework
Fan Zhang
Naye Ji
Fuxing Gao
Bozuo Zhao
Jingmei Wu
...
Zhenqing Ye
Jiayang Zhu
WeiFan Zhong
Leyao Yan
Xiaomeng Ma
27
0
0
01 Aug 2024
Bailing-TTS: Chinese Dialectal Speech Synthesis Towards Human-like
  Spontaneous Representation
Bailing-TTS: Chinese Dialectal Speech Synthesis Towards Human-like Spontaneous Representation
Xinhan Di
Jiahao Lu
Yunming Liang
Junjie Zheng
Yihua Wang
Chaofan Ding
ALM
31
1
0
01 Aug 2024
Sentence-wise Speech Summarization: Task, Datasets, and End-to-End
  Modeling with LM Knowledge Distillation
Sentence-wise Speech Summarization: Task, Datasets, and End-to-End Modeling with LM Knowledge Distillation
Kohei Matsuura
Takanori Ashihara
Takafumi Moriya
Masato Mimura
Takatomo Kano
A. Ogawa
Marc Delcroix
19
0
0
01 Aug 2024
Enhancing Partially Spoofed Audio Localization with Boundary-aware
  Attention Mechanism
Enhancing Partially Spoofed Audio Localization with Boundary-aware Attention Mechanism
Jiafeng Zhong
Bin Li
Jiangyan Yi
19
1
0
31 Jul 2024
Confidence Estimation for Automatic Detection of Depression and
  Alzheimer's Disease Based on Clinical Interviews
Confidence Estimation for Automatic Detection of Depression and Alzheimer's Disease Based on Clinical Interviews
Wen Wu
C. Zhang
P. Woodland
26
1
0
29 Jul 2024
MambaGesture: Enhancing Co-Speech Gesture Generation with Mamba and
  Disentangled Multi-Modality Fusion
MambaGesture: Enhancing Co-Speech Gesture Generation with Mamba and Disentangled Multi-Modality Fusion
Chencan Fu
Yabiao Wang
Jiangning Zhang
Zhengkai Jiang
Xiaofeng Mao
Jiafu Wu
Weijian Cao
Chengjie Wang
Yanhao Ge
Yong Liu
Mamba
35
2
0
29 Jul 2024
ctPuLSE: Close-Talk, and Pseudo-Label Based Far-Field, Speech
  Enhancement
ctPuLSE: Close-Talk, and Pseudo-Label Based Far-Field, Speech Enhancement
Zhong-Qiu Wang
13
1
0
28 Jul 2024
ELP-Adapters: Parameter Efficient Adapter Tuning for Various Speech
  Processing Tasks
ELP-Adapters: Parameter Efficient Adapter Tuning for Various Speech Processing Tasks
Nakamasa Inoue
Shinta Otake
Takumi Hirose
Masanari Ohi
Rei Kawakami
20
1
0
28 Jul 2024
SLIM: Style-Linguistics Mismatch Model for Generalized Audio Deepfake
  Detection
SLIM: Style-Linguistics Mismatch Model for Generalized Audio Deepfake Detection
Yi Zhu
Surya Koppisetti
Trang Tran
Gaurav Bharaj
34
8
0
26 Jul 2024
Describe Where You Are: Improving Noise-Robustness for Speech Emotion
  Recognition with Text Description of the Environment
Describe Where You Are: Improving Noise-Robustness for Speech Emotion Recognition with Text Description of the Environment
Seong-Gyun Leem
Daniel Fulford
J. Onnela
David Gard
Carlos Busso
20
0
0
25 Jul 2024
Speech Editing -- a Summary
Speech Editing -- a Summary
Tobias Kässmann
Yining Liu
Danni Liu
21
0
0
24 Jul 2024
The CHiME-8 DASR Challenge for Generalizable and Array Agnostic Distant
  Automatic Speech Recognition and Diarization
The CHiME-8 DASR Challenge for Generalizable and Array Agnostic Distant Automatic Speech Recognition and Diarization
Samuele Cornell
Taejin Park
Steve Huang
Christoph Boeddeker
Xuankai Chang
Matthew Maciejewski
Matthew Wiesner
Paola García
Shinji Watanabe
20
9
0
23 Jul 2024
Previous
123...567...192021
Next