ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2110.13900
  4. Cited By
WavLM: Large-Scale Self-Supervised Pre-Training for Full Stack Speech
  Processing

WavLM: Large-Scale Self-Supervised Pre-Training for Full Stack Speech Processing

26 October 2021
Sanyuan Chen
Chengyi Wang
Zhengyang Chen
Yu-Huan Wu
Shujie Liu
Zhuo Chen
Jinyu Li
Naoyuki Kanda
Takuya Yoshioka
Xiong Xiao
Jian Wu
Long Zhou
Shuo Ren
Y. Qian
Yao Qian
Jian Wu
Micheal Zeng
Xiangzhan Yu
Furu Wei
    SSL
ArXivPDFHTML

Papers citing "WavLM: Large-Scale Self-Supervised Pre-Training for Full Stack Speech Processing"

50 / 1,019 papers shown
Title
Music Genre Classification using Large Language Models
Music Genre Classification using Large Language Models
Mohamed El Amine Meguenani
Alceu de Souza Britto Jr.
A. L. Koerich
23
0
0
10 Oct 2024
Audio Explanation Synthesis with Generative Foundation Models
Audio Explanation Synthesis with Generative Foundation Models
Alican Akman
Qiyang Sun
Björn W. Schuller
15
0
0
10 Oct 2024
The First VoicePrivacy Attacker Challenge Evaluation Plan
The First VoicePrivacy Attacker Challenge Evaluation Plan
N. Tomashenko
Xiaoxiao Miao
Emmanuel Vincent
Junichi Yamagishi
45
1
0
09 Oct 2024
Learn from Real: Reality Defender's Submission to ASVspoof5 Challenge
Learn from Real: Reality Defender's Submission to ASVspoof5 Challenge
Yi Zhu
C. Goel
Surya Koppisetti
Trang Tran
Ankur Kumar
Gaurav Bharaj
AAML
20
0
0
09 Oct 2024
LS-EEND: Long-Form Streaming End-to-End Neural Diarization with Online
  Attractor Extraction
LS-EEND: Long-Form Streaming End-to-End Neural Diarization with Online Attractor Extraction
Di Liang
Xiaofei Li
14
0
0
09 Oct 2024
Mamba-based Segmentation Model for Speaker Diarization
Mamba-based Segmentation Model for Speaker Diarization
Alexis Plaquet
Naohiro Tawara
Marc Delcroix
Shota Horiguchi
Atsushi Ando
Shoko Araki
Mamba
29
0
0
09 Oct 2024
Sylber: Syllabic Embedding Representation of Speech from Raw Audio
Sylber: Syllabic Embedding Representation of Speech from Raw Audio
Cheol Jun Cho
Nicholas Lee
Akshat Gupta
Dhruv Agarwal
Ethan Chen
Alan W Black
Gopala K. Anumanchipalli
30
0
0
09 Oct 2024
Joint Fine-tuning and Conversion of Pretrained Speech and Language Models towards Linear Complexity
Joint Fine-tuning and Conversion of Pretrained Speech and Language Models towards Linear Complexity
Mutian He
Philip N. Garner
78
0
0
09 Oct 2024
FINALLY: fast and universal speech enhancement with studio-like quality
FINALLY: fast and universal speech enhancement with studio-like quality
Nicholas Babaev
Kirill Tamogashev
Azat Saginbaev
Ivan Shchekotov
Hanbin Bae
Hosang Sung
WonJun Lee
Hoon-Young Cho
Pavel Andreev
17
2
0
08 Oct 2024
HALL-E: Hierarchical Neural Codec Language Model for Minute-Long
  Zero-Shot Text-to-Speech Synthesis
HALL-E: Hierarchical Neural Codec Language Model for Minute-Long Zero-Shot Text-to-Speech Synthesis
Yuto Nishimura
Takumi Hirose
Masanari Ohi
Hideki Nakayama
Nakamasa Inoue
VLM
21
1
0
06 Oct 2024
SyllableLM: Learning Coarse Semantic Units for Speech Language Models
SyllableLM: Learning Coarse Semantic Units for Speech Language Models
Alan Baade
Puyuan Peng
David F. Harwath
37
3
0
05 Oct 2024
Generative Semantic Communication for Text-to-Speech Synthesis
Generative Semantic Communication for Text-to-Speech Synthesis
Jiahao Zheng
Jinke Ren
Peng Xu
Zhihao Yuan
Jie Xu
Fangxin Wang
Gui Gui
Shuguang Cui
18
0
0
04 Oct 2024
Reverb: Open-Source ASR and Diarization from Rev
Reverb: Open-Source ASR and Diarization from Rev
Nishchal Bhandari
Danny Chen
Miguel Ángel del Río Fernández
Natalie Delworth
Jennifer Drexler Fox
...
Ondrej Novotný
Jan Profant
Nan Qin
Martin Ratajczak
Jean-Philippe Robichaud
VLM
26
1
0
04 Oct 2024
FastAdaSP: Multitask-Adapted Efficient Inference for Large Speech
  Language Model
FastAdaSP: Multitask-Adapted Efficient Inference for Large Speech Language Model
Yichen Lu
Jiaqi Song
Chao-Han Huck Yang
Shinji Watanabe
16
0
0
03 Oct 2024
Distilling an End-to-End Voice Assistant Without Instruction Training
  Data
Distilling an End-to-End Voice Assistant Without Instruction Training Data
William B. Held
Ella Li
Michael Joseph Ryan
Weiyan Shi
Yanzhe Zhang
Diyi Yang
AuLLM
29
8
0
03 Oct 2024
NTU-NPU System for Voice Privacy 2024 Challenge
NTU-NPU System for Voice Privacy 2024 Challenge
Nikita Kuzmin
Hieu-Thi Luong
Jixun Yao
Lei Xie
Kong Aik Lee
Eng Siong Chng
36
1
0
03 Oct 2024
A Pilot Study of Applying Sequence-to-Sequence Voice Conversion to
  Evaluate the Intelligibility of L2 Speech Using a Native Speaker's Shadowings
A Pilot Study of Applying Sequence-to-Sequence Voice Conversion to Evaluate the Intelligibility of L2 Speech Using a Native Speaker's Shadowings
Haopeng Geng
Daisuke Saito
N. Minematsu
13
1
0
03 Oct 2024
Anchors Aweigh! Sail for Optimal Unified Multi-Modal Representations
Anchors Aweigh! Sail for Optimal Unified Multi-Modal Representations
Minoh Jeong
Min Namgung
Zae Myung Kim
Dongyeop Kang
Yao-Yi Chiang
Alfred Hero
23
0
0
02 Oct 2024
Improving curriculum learning for target speaker extraction with
  synthetic speakers
Improving curriculum learning for target speaker extraction with synthetic speakers
Yun Liu
Xuechen Liu
Junichi Yamagishi
18
0
0
01 Oct 2024
Recent Advances in Speech Language Models: A Survey
Recent Advances in Speech Language Models: A Survey
Wenqian Cui
Dianzhi Yu
Xiaoqi Jiao
Ziqiao Meng
Guangyan Zhang
Qichao Wang
Yiwen Guo
Irwin King
AuLLM
57
14
0
01 Oct 2024
AfriHuBERT: A self-supervised speech representation model for African
  languages
AfriHuBERT: A self-supervised speech representation model for African languages
Jesujoba Oluwadara Alabi
Xuechen Liu
Dietrich Klakow
Junichi Yamagishi
VLM
28
0
0
30 Sep 2024
Analyzing and Mitigating Inconsistency in Discrete Audio Tokens for
  Neural Codec Language Models
Analyzing and Mitigating Inconsistency in Discrete Audio Tokens for Neural Codec Language Models
Wenrui Liu
Zhifang Guo
Jin Xu
Yuanjun Lv
Yunfei Chu
Zhou Zhao
Junyang Lin
36
1
0
28 Sep 2024
XWSB: A Blend System Utilizing XLS-R and WavLM with SLS Classifier
  detection system for SVDD 2024 Challenge
XWSB: A Blend System Utilizing XLS-R and WavLM with SLS Classifier detection system for SVDD 2024 Challenge
Qishan Zhang
Shuangbing Wen
Fangke Yan
Tao Hu
Jun Li
15
2
0
27 Sep 2024
EmoPro: A Prompt Selection Strategy for Emotional Expression in LM-based
  Speech Synthesis
EmoPro: A Prompt Selection Strategy for Emotional Expression in LM-based Speech Synthesis
Haoyu Wang
Chunyu Qiang
Tianrui Wang
Cheng Gong
Qiuyu Liu
Yu Jiang
Xiaobao Wang
Chenyang Wang
Chen Zhang
26
0
0
27 Sep 2024
Prototype based Masked Audio Model for Self-Supervised Learning of Sound
  Event Detection
Prototype based Masked Audio Model for Self-Supervised Learning of Sound Event Detection
Pengfei Cai
Yan Song
Nan Jiang
Qing Gu
Ian Mcloughlin
20
2
0
26 Sep 2024
Description-based Controllable Text-to-Speech with Cross-Lingual Voice
  Control
Description-based Controllable Text-to-Speech with Cross-Lingual Voice Control
Ryuichi Yamamoto
Yuma Shirahata
Masaya Kawamura
Kentaro Tachibana
DiffM
19
2
0
26 Sep 2024
Enhancing Polyglot Voices by Leveraging Cross-Lingual Fine-Tuning in
  Any-to-One Voice Conversion
Enhancing Polyglot Voices by Leveraging Cross-Lingual Fine-Tuning in Any-to-One Voice Conversion
Giuseppe Ruggiero
Matteo Testa
Jurgen Van de Walle
Luigi Di Caro
16
0
0
25 Sep 2024
Enabling Auditory Large Language Models for Automatic Speech Quality Evaluation
Enabling Auditory Large Language Models for Automatic Speech Quality Evaluation
Siyin Wang
Wenyi Yu
Yudong Yang
Changli Tang
Yixuan Li
...
Jun Zhang
Guangzhi Sun
Lu Lu
Yuxuan Wang
Chao Zhang
AuLLM
LM&MA
65
5
0
25 Sep 2024
Emotional Dimension Control in Language Model-Based Text-to-Speech: Spanning a Broad Spectrum of Human Emotions
Emotional Dimension Control in Language Model-Based Text-to-Speech: Spanning a Broad Spectrum of Human Emotions
Kun Zhou
You Zhang
Shengkui Zhao
Hao Wang
Zexu Pan
...
Chongjia Ni
Yukun Ma
Trung Hieu Nguyen
J. Yip
Bin Ma
42
4
0
25 Sep 2024
Cross-Lingual Speech Emotion Recognition: Humans vs. Self-Supervised Models
Cross-Lingual Speech Emotion Recognition: Humans vs. Self-Supervised Models
Zhichen Han
Tianqi Geng
Hui Feng
Jiahong Yuan
Korin Richmond
Yuanchao Li
28
1
0
25 Sep 2024
MT2KD: Towards A General-Purpose Encoder for Speech, Speaker, and Audio Events
MT2KD: Towards A General-Purpose Encoder for Speech, Speaker, and Audio Events
Xiaoyu Yang
Qiujia Li
Chao Zhang
P. Woodland
15
0
0
25 Sep 2024
Evaluation of state-of-the-art ASR Models in Child-Adult Interactions
Evaluation of state-of-the-art ASR Models in Child-Adult Interactions
Aditya Ashvin
Rimita Lahiri
Aditya Kommineni
Somer Bishop
C. Lord
Sudarsana Reddy Kadiri
Shrikanth Narayanan
21
0
0
24 Sep 2024
Generative Speech Foundation Model Pretraining for High-Quality Speech
  Extraction and Restoration
Generative Speech Foundation Model Pretraining for High-Quality Speech Extraction and Restoration
Pin-Jui Ku
Alexander H. Liu
Roman Korostik
Sung-Feng Huang
Szu-Wei Fu
Ante Jukić
26
2
0
24 Sep 2024
Bridging Speech and Text: Enhancing ASR with Pinyin-to-Character
  Pre-training in LLMs
Bridging Speech and Text: Enhancing ASR with Pinyin-to-Character Pre-training in LLMs
Yang Yuhang
Peng Yizhou
Eng Siong Chng
Xionghu Zhong
AuLLM
AI4CE
16
0
0
24 Sep 2024
Boosting Code-Switching ASR with Mixture of Experts Enhanced
  Speech-Conditioned LLM
Boosting Code-Switching ASR with Mixture of Experts Enhanced Speech-Conditioned LLM
Fengrun Zhang
Wang Geng
Hukai Huang
Cheng Yi
He Qu
He Qu
AuLLM
MoE
20
1
0
24 Sep 2024
Representation Loss Minimization with Randomized Selection Strategy for
  Efficient Environmental Fake Audio Detection
Representation Loss Minimization with Randomized Selection Strategy for Efficient Environmental Fake Audio Detection
Orchid Chetia Phukan
Girish
Mohd Mujtaba Akhtar
Swarup Ranjan Behera
Nitin Choudhury
Arun Balaji Buduru
Rajesh Sharma
S. R Mahadeva Prasanna
13
0
0
24 Sep 2024
Enhancing Open-Set Speaker Identification through Rapid Tuning with
  Speaker Reciprocal Points and Negative Sample
Enhancing Open-Set Speaker Identification through Rapid Tuning with Speaker Reciprocal Points and Negative Sample
Zhiyong Chen
Zhiqi Ai
Xinnuo Li
Shugong Xu
23
0
0
24 Sep 2024
TCSinger: Zero-Shot Singing Voice Synthesis with Style Transfer and Multi-Level Style Control
TCSinger: Zero-Shot Singing Voice Synthesis with Style Transfer and Multi-Level Style Control
Yu Zhang
Ziyue Jiang
Ruiqi Li
Changhao Pan
Jinzheng He
Rongjie Huang
Chuxin Wang
Zhou Zhao
DiffM
VLM
28
4
0
24 Sep 2024
Speech2rtMRI: Speech-Guided Diffusion Model for Real-time MRI Video of
  the Vocal Tract during Speech
Speech2rtMRI: Speech-Guided Diffusion Model for Real-time MRI Video of the Vocal Tract during Speech
Hong Nguyen
Sean Foley
Kevin Huang
Xuan Shi
Tiantian Feng
Shrikanth Narayanan
VGen
DiffM
19
1
0
23 Sep 2024
CA-MHFA: A Context-Aware Multi-Head Factorized Attentive Pooling for
  SSL-Based Speaker Verification
CA-MHFA: A Context-Aware Multi-Head Factorized Attentive Pooling for SSL-Based Speaker Verification
Junyi Peng
Ladislav Mošner
Lin Zhang
Oldrich Plchot
Themos Stafylakis
Lukáš Burget
Jan Černocký
9
0
0
23 Sep 2024
Semi-supervised Learning For Robust Speech Evaluation
Semi-supervised Learning For Robust Speech Evaluation
Huayun Zhang
Jeremy H. M. Wong
Geyu Lin
Nancy F. Chen
15
0
0
23 Sep 2024
A Comprehensive Survey with Critical Analysis for Deepfake Speech Detection
A Comprehensive Survey with Critical Analysis for Deepfake Speech Detection
Lam Pham
Phat Lam
Dat Tran
Hieu Tang
Tin Nguyen
Alexander Schindler
Canh Vu
Alexander Polonsky
Canh Vu
40
3
0
23 Sep 2024
What Are They Doing? Joint Audio-Speech Co-Reasoning
What Are They Doing? Joint Audio-Speech Co-Reasoning
Yingzhi Wang
Pooneh Mousavi
Artem Ploujnikov
Mirco Ravanelli
AuLLM
41
0
0
22 Sep 2024
Strong Alone, Stronger Together: Synergizing Modality-Binding Foundation
  Models with Optimal Transport for Non-Verbal Emotion Recognition
Strong Alone, Stronger Together: Synergizing Modality-Binding Foundation Models with Optimal Transport for Non-Verbal Emotion Recognition
Orchid Chetia Phukan
Mohd Mujtaba Akhtar
Girish
Swarup Ranjan Behera
Sishir Kalita
Arun Balaji Buduru
Rajesh Sharma
S. R Mahadeva Prasanna
EgoV
21
0
0
21 Sep 2024
Are Music Foundation Models Better at Singing Voice Deepfake Detection?
  Far-Better Fuse them with Speech Foundation Models
Are Music Foundation Models Better at Singing Voice Deepfake Detection? Far-Better Fuse them with Speech Foundation Models
Orchid Chetia Phukan
Sarthak Jain
Swarup Ranjan Behera
Arun Balaji Buduru
Rajesh Sharma
S. R Mahadeva Prasanna
16
0
0
21 Sep 2024
GTSinger: A Global Multi-Technique Singing Corpus with Realistic Music Scores for All Singing Tasks
GTSinger: A Global Multi-Technique Singing Corpus with Realistic Music Scores for All Singing Tasks
Yu Zhang
Changhao Pan
Wenxiang Guo
Ruiqi Li
Z. Zhu
...
Yuxin Chen
Chen Yang
Jiecheng Zhou
Xinyu Cheng
Zhou Zhao
13
6
0
20 Sep 2024
Enhancing Synthetic Training Data for Speech Commands: From ASR-Based
  Filtering to Domain Adaptation in SSL Latent Space
Enhancing Synthetic Training Data for Speech Commands: From ASR-Based Filtering to Domain Adaptation in SSL Latent Space
Sebastião Quintas
Isabelle Ferrané
Thomas Pellegrini
23
0
0
19 Sep 2024
NDVQ: Robust Neural Audio Codec with Normal Distribution-Based Vector
  Quantization
NDVQ: Robust Neural Audio Codec with Normal Distribution-Based Vector Quantization
Zhikang Niu
Sanyuan Chen
Long Zhou
Ziyang Ma
Xie Chen
Shujie Liu
21
2
0
19 Sep 2024
FruitsMusic: A Real-World Corpus of Japanese Idol-Group Songs
FruitsMusic: A Real-World Corpus of Japanese Idol-Group Songs
Hitoshi Suda
Shunsuke Yoshida
Tomohiko Nakamura
Satoru Fukayama
Jun Ogata
21
0
0
19 Sep 2024
SoundBeam meets M2D: Target Sound Extraction with Audio Foundation Model
SoundBeam meets M2D: Target Sound Extraction with Audio Foundation Model
Carlos Hernandez-Olivan
Marc Delcroix
Tsubasa Ochiai
Daisuke Niizumi
Naohiro Tawara
Tomohiro Nakatani
Shoko Araki
21
2
0
19 Sep 2024
Previous
12345...192021
Next