Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2110.13900
Cited By
WavLM: Large-Scale Self-Supervised Pre-Training for Full Stack Speech Processing
26 October 2021
Sanyuan Chen
Chengyi Wang
Zhengyang Chen
Yu-Huan Wu
Shujie Liu
Zhuo Chen
Jinyu Li
Naoyuki Kanda
Takuya Yoshioka
Xiong Xiao
Jian Wu
Long Zhou
Shuo Ren
Y. Qian
Yao Qian
Jian Wu
Micheal Zeng
Xiangzhan Yu
Furu Wei
SSL
Re-assign community
ArXiv
PDF
HTML
Papers citing
"WavLM: Large-Scale Self-Supervised Pre-Training for Full Stack Speech Processing"
50 / 1,019 papers shown
Title
Self-Supervised Models for Phoneme Recognition: Applications in Children's Speech for Reading Learning
Lucas Block Medin
Thomas Pellegrini
Lucile Gelin
SSL
51
1
0
06 Mar 2025
Qieemo: Speech Is All You Need in the Emotion Recognition in Conversations
Jinming Chen
Jingyi Fang
Yuanzhong Zheng
Yaoxuan Wang
Haojun Fei
41
0
0
05 Mar 2025
KeyFace: Expressive Audio-Driven Facial Animation for Long Sequences via KeyFrame Interpolation
Antoni Bigata
Michał Stypułkowski
Rodrigo Mira
Stella Bounareli
Konstantinos Vougioukas
Zoe Landgraf
Nikita Drobyshev
Maciej Ziȩba
Stavros Petridis
M. Pantic
DiffM
VGen
61
2
0
03 Mar 2025
UniWav: Towards Unified Pre-training for Speech Representation Learning and Generation
Alexander H. Liu
Sang-gil Lee
Chao-Han Huck Yang
Yuan Gong
Yu-Chun Wang
James Glass
Rafael Valle
Bryan Catanzaro
SSL
40
0
0
02 Mar 2025
LLaSE-G1: Incentivizing Generalization Capability for LLaMA-based Speech Enhancement
Boyi Kang
Xinfa Zhu
Zihan Zhang
Zhen Ye
Mingshuai Liu
...
Jun Chen
Longshuai Xiao
Chao Weng
Wei Xue
Lei Xie
AuLLM
55
3
0
01 Mar 2025
DIN-CTS: Low-Complexity Depthwise-Inception Neural Network with Contrastive Training Strategy for Deepfake Speech Detection
L. D. Pham
Dat Tran
Florian Skopik
Alexander Schindler
Silvia Poletti
Fischinger David
Martin Boyer
Martin Boyer
46
1
0
27 Feb 2025
Residual Speech Embeddings for Tone Classification: Removing Linguistic Content to Enhance Paralinguistic Analysis
Hamdan Al Ahbabi
Gautier Marti
Saeed AlMarri
Ibrahim Elfadel
47
0
0
26 Feb 2025
Enhancing Speech Large Language Models with Prompt-Aware Mixture of Audio Encoders
Weiqiao Shan
Y. Li
Yuhao Zhang
Yingfeng Luo
Chen Xu
...
Y. Lu
M. Zhang
Hao Yang
Tong Xiao
Jingbo Zhu
AuLLM
57
0
0
24 Feb 2025
AAD-LLM: Neural Attention-Driven Auditory Scene Understanding
Xilin Jiang
Sukru Samet Dindar
Vishal B. Choudhari
Stephan Bickel
A. Mehta
Guy M McKhann
A. Flinker
D. Friedman
N. Mesgarani
30
1
0
24 Feb 2025
Baichuan-Audio: A Unified Framework for End-to-End Speech Interaction
Tianpeng Li
J. Liu
Tao Zhang
Yuanbo Fang
Da Pan
...
Guosheng Dong
Jianhua Xu
Haoze Sun
Zenan Zhou
Weipeng Chen
AuLLM
45
3
0
24 Feb 2025
voc2vec: A Foundation Model for Non-Verbal Vocalization
Alkis Koudounas
Moreno La Quatra
Marco Sabato Siniscalchi
Elena Baralis
36
0
0
22 Feb 2025
KAD: No More FAD! An Effective and Efficient Evaluation Metric for Audio Generation
Yoonjin Chung
Pilsun Eu
Junwon Lee
Keunwoo Choi
Juhan Nam
Ben Sangbae Chon
EGVM
57
3
0
21 Feb 2025
How do Multimodal Foundation Models Encode Text and Speech? An Analysis of Cross-Lingual and Cross-Modal Representations
Hyunji Lee
Danni Liu
Supriti Sinhamahapatra
Jan Niehues
103
0
0
21 Feb 2025
Soundwave: Less is More for Speech-Text Alignment in LLMs
Y. Zhang
Zhiheng Liu
Fan Bu
Ruiyu Zhang
Benyou Wang
H. Li
AuLLM
SyDa
VLM
98
0
0
18 Feb 2025
Less is More for Synthetic Speech Detection in the Wild
Ashi Garg
Zexin Cai
Henry Li Xinyuan
Leibny Paola García-Perera
Kevin Duh
Sanjeev Khudanpur
Matthew Wiesner
Nicholas Andrews
65
0
0
17 Feb 2025
Demographic Attributes Prediction from Speech Using WavLM Embeddings
Yuchen Yang
Thomas Thebaud
Najim Dehak
42
0
0
17 Feb 2025
Efficient Finetuning for Dimensional Speech Emotion Recognition in the Age of Transformers
Aneesha Sampath
James Tavernor
E. Provost
36
0
0
17 Feb 2025
FELLE: Autoregressive Speech Synthesis with Token-Wise Coarse-to-Fine Flow Matching
Hui Wang
Shujie Liu
Lingwei Meng
J. Li
Yifan Yang
...
Yanqing Liu
Haoqin Sun
Jiaming Zhou
Yan Lu
Yong Qin
43
0
0
16 Feb 2025
BrainWavLM: Fine-tuning Speech Representations with Brain Responses to Language
Nishitha Vattikonda
A. Vaidya
Richard Antonello
Alexander G. Huth
91
0
0
13 Feb 2025
Do we really have to filter out random noise in pre-training data for language models?
Jinghan Ru
Yuxin Xie
Xianwei Zhuang
Yuguo Yin
Yuexian Zou
72
2
0
10 Feb 2025
Evaluation of Deep Audio Representations for Hearables
Fabian Gröger
Pascal Baumann
L. Amruthalingam
Laurent Simon
Ruksana Giurda
Simone Lionetti
72
0
0
10 Feb 2025
Audio-Visual Representation Learning via Knowledge Distillation from Speech Foundation Models
Jing-Xuan Zhang
Genshun Wan
Jianqing Gao
Zhen-Hua Ling
42
0
0
09 Feb 2025
The Role of Prosody in Spoken Question Answering
Jie Chi
Maureen de Seyssel
Natalie Schluter
44
0
0
08 Feb 2025
Afrispeech-Dialog: A Benchmark Dataset for Spontaneous English Conversations in Healthcare and Beyond
Mardhiyah Sanni
Tassallah Abdullahi
Devendra D. Kayande
Emmanuel Ayodele
Naome A. Etori
...
Chibuzor Okocha
L. Ismaila
Folafunmi Omofoye
Boluwatife A. Adewale
Tobi Olatunji
80
0
0
06 Feb 2025
Leveraging Broadcast Media Subtitle Transcripts for Automatic Speech Recognition and Subtitling
Jakob Poncelet
Hugo Van hamme
62
0
0
05 Feb 2025
AudioMiXR: Spatial Audio Object Manipulation with 6DoF for Sound Design in Augmented Reality
Brandon Woodard
Margarita Geleta
Joseph J. LaViola Jr.
Andrea Fanelli
Rhonda Wilson
55
1
0
05 Feb 2025
High-Fidelity Simultaneous Speech-To-Speech Translation
Tom Labiausse
Laurent Mazaré
Edouard Grave
P. Pérez
Alexandre Défossez
Neil Zeghidour
73
0
0
05 Feb 2025
Comprehensive Layer-wise Analysis of SSL Models for Audio Deepfake Detection
Yassine El Kheir
Youness Samih
Suraj Maharjan
Tim Polzehl
Sebastian Möller
57
1
0
05 Feb 2025
Emilia: A Large-Scale, Extensive, Multilingual, and Diverse Dataset for Speech Generation
Haorui He
Zengqiang Shang
Chaoren Wang
Xuyuan Li
Yicheng Gu
...
Peiyang Shi
Y. Wang
Kai Chen
Pengyuan Zhang
Z. Wu
AuLLM
47
3
0
28 Jan 2025
Optimized Self-supervised Training with BEST-RQ for Speech Recognition
Ilja Baumann
Dominik Wagner
K. Riedhammer
Tobias Bocklet
67
0
0
28 Jan 2025
Summary of the NOTSOFAR-1 Challenge: Highlights and Learnings
Igor Abramovski
Alon Vinnikov
Shalev Shaer
Naoyuki Kanda
Xiaofei Wang
Amir Ivry
Eyal Krupka
34
0
0
28 Jan 2025
Safe Gradient Flow for Bilevel Optimization
Sina Sharifi
Nazanin Abolfazli
E. Y. Hamedani
Mahyar Fazlyab
31
1
0
27 Jan 2025
Multi-Task Corrupted Prediction for Learning Robust Audio-Visual Speech Representation
Sungnyun Kim
Sungwoo Cho
Sangmin Bae
Kangwook Jang
Se-Young Yun
SSL
68
1
0
23 Jan 2025
LLM supervised Pre-training for Multimodal Emotion Recognition in Conversations
Soumya Dutta
Sriram Ganapathy
31
2
0
20 Jan 2025
How Redundant Is the Transformer Stack in Speech Representation Models?
Teresa Dorszewski
Albert Kjøller Jacobsen
Lenka Tětková
Lars Kai Hansen
104
0
0
20 Jan 2025
Unsupervised Rhythm and Voice Conversion of Dysarthric to Healthy Speech for ASR
Karl El Hajal
Enno Hermann
Ajinkya Kulkarni
Mathew Magimai.-Doss
31
0
0
20 Jan 2025
Target Speaker ASR with Whisper
Alexander Polok
Dominik Klement
Matthew Wiesner
Sanjeev Khudanpur
J. Černocký
L. Burget
91
1
0
17 Jan 2025
SD-Eval: A Benchmark Dataset for Spoken Dialogue Understanding Beyond Words
Junyi Ao
Yuancheng Wang
Xiaohai Tian
Dekun Chen
J. Zhang
Lu Lu
Y. Wang
Haizhou Li
Z. Wu
AuLLM
71
16
0
17 Jan 2025
Discrete Speech Unit Extraction via Independent Component Analysis
Tomohiko Nakamura
Kwanghee Choi
Keigo Hojo
Yoshiaki Bando
Satoru Fukayama
Shinji Watanabe
38
0
0
11 Jan 2025
A Survey on Spoken Italian Datasets and Corpora
Marco Giordano
Claudia Rinaldi
36
0
0
11 Jan 2025
TTS-Transducer: End-to-End Speech Synthesis with Neural Transducer
Vladimir Bataev
Subhankar Ghosh
Vitaly Lavrukhin
Jason Chun Lok Li
AI4TS
31
0
0
10 Jan 2025
Audio-Language Datasets of Scenes and Events: A Survey
Gijs Wijngaard
Elia Formisano
Michele Esposito
M. Dumontier
70
2
0
10 Jan 2025
HAAQI-Net: A Non-intrusive Neural Music Audio Quality Assessment Model for Hearing Aids
Dyah A. M. G. Wisnu
Stefano Rini
Ryandhimas E. Zezario
Hsin-Min Wang
Yu Tsao
42
0
0
10 Jan 2025
Improving Speech Emotion Recognition in Under-Resourced Languages via Speech-to-Speech Translation with Bootstrapping Data Selection
Hsi-Che Lin
Yi-Cheng Lin
Huang-Cheng Chou
Hung-yi Lee
19
0
0
08 Jan 2025
Spectral-Aware Low-Rank Adaptation for Speaker Verification
Zhe Li
Man-Wai Mak
Mert Pilanci
Hung-yi Lee
H. Meng
41
0
0
07 Jan 2025
Noise-Robust Target-Speaker Voice Activity Detection Through Self-Supervised Pretraining
H. S. Bovbjerg
Jan Østergaard
Jesper Jensen
Zheng-Hua Tan
36
0
0
06 Jan 2025
Leveraging Cross-Attention Transformer and Multi-Feature Fusion for Cross-Linguistic Speech Emotion Recognition
Ruoyu Zhao
Xiantao Jiang
Fei Yu
Victor C.M. Leung
Tao Wang
S. Zhang
27
0
0
06 Jan 2025
SSR-Speech: Towards Stable, Safe and Robust Zero-shot Text-based Speech Editing and Synthesis
Helin Wang
Meng Yu
Jiarui Hai
Chen Chen
Yuchen Hu
Rilin Chen
Najim Dehak
Dong Yu
68
3
0
03 Jan 2025
Metadata-Enhanced Speech Emotion Recognition: Augmented Residual Integration and Co-Attention in Two-Stage Fine-Tuning
Zixiang Wan
Ziyue Qiu
Yiyang Liu
Wei-Qiang Zhang
26
0
0
31 Dec 2024
Memory-Centric Computing: Recent Advances in Processing-in-DRAM
O. Mutlu
Ataberk Olgun
Geraldo F. Oliveira
Ismail Emir Yüksel
40
1
0
26 Dec 2024
Previous
1
2
3
4
5
...
19
20
21
Next