Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2110.13900
Cited By
WavLM: Large-Scale Self-Supervised Pre-Training for Full Stack Speech Processing
26 October 2021
Sanyuan Chen
Chengyi Wang
Zhengyang Chen
Yu-Huan Wu
Shujie Liu
Zhuo Chen
Jinyu Li
Naoyuki Kanda
Takuya Yoshioka
Xiong Xiao
Jian Wu
Long Zhou
Shuo Ren
Y. Qian
Yao Qian
Jian Wu
Micheal Zeng
Xiangzhan Yu
Furu Wei
SSL
Re-assign community
ArXiv
PDF
HTML
Papers citing
"WavLM: Large-Scale Self-Supervised Pre-Training for Full Stack Speech Processing"
50 / 1,021 papers shown
Title
The CHiME-8 DASR Challenge for Generalizable and Array Agnostic Distant Automatic Speech Recognition and Diarization
Samuele Cornell
Taejin Park
Steve Huang
Christoph Boeddeker
Xuankai Chang
Matthew Maciejewski
Matthew Wiesner
Paola García
Shinji Watanabe
22
9
0
23 Jul 2024
Evolutionary Prompt Design for LLM-Based Post-ASR Error Correction
Rithik Sachdev
Zhong-Qiu Wang
Chao-Han Huck Yang
19
3
0
23 Jul 2024
Overview of Speaker Modeling and Its Applications: From the Lens of Deep Speaker Representation Learning
Shuai Wang
Zheng-Shou Chen
Kong Aik Lee
Yan-min Qian
Haizhou Li
24
4
0
21 Jul 2024
Using Speech Foundational Models in Loss Functions for Hearing Aid Speech Enhancement
Robert Sutherland
George Close
Thomas Hain
Stefan Goetze
Jon Barker
19
1
0
18 Jul 2024
MDPE: A Multimodal Deception Dataset with Personality and Emotional Characteristics
Cong Cai
Shan Liang
Xuefei Liu
Kang Zhu
Zhengqi Wen
...
Zhenhua Cheng
Hanzhe Xu
Ruibo Fu
Bin Liu
Yongwei Li
24
3
0
17 Jul 2024
Laugh Now Cry Later: Controlling Time-Varying Emotional States of Flow-Matching-Based Zero-Shot Text-to-Speech
Haibin Wu
Xiaofei Wang
Sefik Emre Eskimez
Manthan Thakker
Daniel Tompkins
...
Canrun Li
Zhen Xiao
Sheng Zhao
Jinyu Li
Naoyuki Kanda
15
6
0
17 Jul 2024
A Language Modeling Approach to Diacritic-Free Hebrew TTS
Amit Roth
A. Turetzky
Yossi Adi
27
2
0
16 Jul 2024
Universal Sound Separation with Self-Supervised Audio Masked Autoencoder
Junqi Zhao
Xubo Liu
Jinzheng Zhao
Yiitan Yuan
Qiuqiang Kong
Mark D. Plumbley
Wenwu Wang
25
3
0
16 Jul 2024
Vibravox: A Dataset of French Speech Captured with Body-conduction Audio Sensors
J. Hauret
Malo Olivier
Thomas Joubaud
C. Langrenne
Sarah Poirée
V. Zimpfer
Éric Bavu
62
1
0
16 Jul 2024
Qwen2-Audio Technical Report
Yunfei Chu
Jin Xu
Qian Yang
Haojie Wei
Xipin Wei
...
Yuanjun Lv
Jinzheng He
Junyang Lin
Chang Zhou
Jingren Zhou
AuLLM
VLM
32
100
0
15 Jul 2024
Whisper-SV: Adapting Whisper for Low-data-resource Speaker Verification
Li Lyna Zhang
Ning Jiang
Qing Wang
Yuehong Li
Quan Lu
Lei Xie
27
6
0
14 Jul 2024
Autoregressive Speech Synthesis without Vector Quantization
Lingwei Meng
Long Zhou
Shujie Liu
Sanyuan Chen
Bing Han
...
Jinyu Li
Sheng Zhao
Xixin Wu
Helen Meng
Furu Wei
27
4
0
11 Jul 2024
VoxMed: One-Step Respiratory Disease Classifier using Digital Stethoscope Sounds
Paridhi Mundra
Manik Sharma
Yashwardhan Chaudhuri
Orchid Chetia Phukan
Arun Balaji Buduru
17
0
0
10 Jul 2024
Listen and Speak Fairly: A Study on Semantic Gender Bias in Speech Integrated Large Language Models
Yi-Cheng Lin
T. Lin
Chih-Kai Yang
Ke-Han Lu
Wei-Chih Chen
Chun-Yi Kuan
Hung-yi Lee
29
1
0
09 Jul 2024
MSP-Podcast SER Challenge 2024: Lántenne du Ventoux Multimodal Self-Supervised Learning for Speech Emotion Recognition
J. Duret
Mickael Rouvier
Yannick Esteve
33
0
0
08 Jul 2024
Emilia: An Extensive, Multilingual, and Diverse Speech Dataset for Large-Scale Speech Generation
Haorui He
Zengqiang Shang
Chaoren Wang
Xuyuan Li
Yicheng Gu
...
Peiyang Shi
Yuancheng Wang
Kai Chen
Pengyuan Zhang
Zhizheng Wu
25
33
0
07 Jul 2024
A Layer-Anchoring Strategy for Enhancing Cross-Lingual Speech Emotion Recognition
Shreya G. Upadhyay
Carlos Busso
Chi-Chun Lee
31
3
0
06 Jul 2024
Seed-ASR: Understanding Diverse Speech and Contexts with LLM-based Speech Recognition
Ye Bai
Jingping Chen
Jitong Chen
Wei Chen
Zhuo Chen
...
Wanyi Zhang
Yang Zhang
Yawei Zhang
Yijie Zheng
Ming Zou
AuLLM
39
19
0
05 Jul 2024
Performance Analysis of Speech Encoders for Low-Resource SLU and ASR in Tunisian Dialect
Salima Mdhaffar
Haroun Elleuch
Fethi Bougares
Yannick Esteve
46
0
0
05 Jul 2024
Who Finds This Voice Attractive? A Large-Scale Experiment Using In-the-Wild Data
Hitoshi Suda
Aya Watanabe
Shinnosuke Takamichi
18
0
0
05 Jul 2024
MS2SL: Multimodal Spoken Data-Driven Continuous Sign Language Production
Jian Ma
Wenguan Wang
Yi Yang
Feng Zheng
45
1
0
04 Jul 2024
On the Effectiveness of Acoustic BPE in Decoder-Only TTS
Bohan Li
Feiyu Shen
Yiwei Guo
Shuai Wang
Xie Chen
Kai Yu
29
2
0
04 Jul 2024
Improving Self-supervised Pre-training using Accent-Specific Codebooks
Darshan Prabhu
Abhishek Gupta
Omkar Nitsure
P. Jyothi
Sriram Ganapathy
SSL
39
0
0
04 Jul 2024
Continual Learning Optimizations for Auto-regressive Decoder of Multilingual ASR systems
Chin Yuen Kwok
J. Yip
Eng Siong Chng
CLL
27
1
0
04 Jul 2024
Codec-ASR: Training Performant Automatic Speech Recognition Systems with Discrete Speech Representations
Kunal Dhawan
Nithin Rao Koluguri
Ante Jukić
Ryan Langman
Jagadeesh Balam
Boris Ginsburg
36
1
0
03 Jul 2024
Self-supervised ASR Models and Features For Dysarthric and Elderly Speech Recognition
Shujie Hu
Xurong Xie
Mengzhe Geng
Zengrui Jin
Jiajun Deng
...
Yi Wang
Mingyu Cui
Tianzi Wang
Helen Meng
Xunying Liu
35
5
0
03 Jul 2024
Towards the Next Frontier in Speech Representation Learning Using Disentanglement
Varun Krishna
Sriram Ganapathy
SSL
17
1
0
02 Jul 2024
Investigating the Effects of Large-Scale Pseudo-Stereo Data and Different Speech Foundation Model on Dialogue Generative Spoken Language Model
Yu-Kuan Fu
Cheng-Kuang Lee
Hsiu-Hsuan Wang
Hung-yi Lee
22
0
0
02 Jul 2024
ICAGC 2024: Inspirational and Convincing Audio Generation Challenge 2024
Ruibo Fu
Rui Liu
Chunyu Qiang
Yingming Gao
Yi Lu
...
Chen Zhang
Hui Bu
Yukun Liu
Xin Qi
Guanjun Li
19
5
0
01 Jul 2024
Towards Robust Speech Representation Learning for Thousands of Languages
William Chen
Wangyou Zhang
Yifan Peng
Xinjian Li
Jinchuan Tian
Jiatong Shi
Xuankai Chang
Soumi Maiti
Karen Livescu
Shinji Watanabe
ELM
25
6
0
30 Jun 2024
NAIST Simultaneous Speech Translation System for IWSLT 2024
Yuka Ko
Ryo Fukuda
Yuta Nishikawa
Yasumasa Kano
Tomoya Yanagita
...
Haotian Tan
Makoto Sakai
S. Sakti
Katsuhito Sudoh
Satoshi Nakamura
20
1
0
30 Jun 2024
FLY-TTS: Fast, Lightweight and High-Quality End-to-End Text-to-Speech Synthesis
Yinlin Guo
Yening Lv
Jinqiao Dou
Yan Zhang
Yuehai Wang
18
0
0
30 Jun 2024
Factor-Conditioned Speaking-Style Captioning
Atsushi Ando
Takafumi Moriya
Shota Horiguchi
Ryo Masumura
22
0
0
27 Jun 2024
WavRx: a Disease-Agnostic, Generalizable, and Privacy-Preserving Speech Health Diagnostic Model
Yi Zhu
Tiago H. Falk
MedIm
23
0
0
26 Jun 2024
E2 TTS: Embarrassingly Easy Fully Non-Autoregressive Zero-Shot TTS
Sefik Emre Eskimez
Xiaofei Wang
Manthan Thakker
Canrun Li
Chung-Hsien Tsai
...
Min Tang
Xu Tan
Yanqing Liu
Sheng Zhao
Naoyuki Kanda
VLM
22
46
0
26 Jun 2024
Improving Robustness of LLM-based Speech Synthesis by Learning Monotonic Alignment
Paarth Neekhara
Shehzeen Samarah Hussain
Subhankar Ghosh
Jason Chun Lok Li
Rafael Valle
Rohan Badlani
Boris Ginsburg
37
11
0
25 Jun 2024
Towards Probing Speech-Specific Risks in Large Multimodal Models: A Taxonomy, Benchmark, and Insights
Hao Yang
Lizhen Qu
Ehsan Shareghi
Gholamreza Haffari
23
0
0
25 Jun 2024
Speaker-Independent Acoustic-to-Articulatory Inversion through Multi-Channel Attention Discriminator
Woo-Jin Chung
Hong-Goo Kang
21
1
0
25 Jun 2024
Self-Supervised Embeddings for Detecting Individual Symptoms of Depression
Sri Harsha Dumpala
Katerina Dikaios
Abraham Nunes
Frank Rudzicz
Rudolf Uher
Sageev Oore
SSL
28
1
0
25 Jun 2024
Blending LLMs into Cascaded Speech Translation: KIT's Offline Speech Translation System for IWSLT 2024
Sai Koneru
Thai-Binh Nguyen
Ngoc-Quan Pham
Danni Liu
Zhaolin Li
Alexander Waibel
Jan Niehues
OffRL
20
1
0
24 Jun 2024
AND: Audio Network Dissection for Interpreting Deep Acoustic Models
Tung-Yu Wu
Yu-Xiang Lin
Tsui-Wei Weng
29
1
0
24 Jun 2024
Speech Analysis of Language Varieties in Italy
Moreno La Quatra
Alkis Koudounas
Elena Baralis
Sabato Marco Siniscalchi
25
3
0
22 Jun 2024
Multimodal Segmentation for Vocal Tract Modeling
Rishi Jain
Bohan Yu
Peter Wu
Tejas S. Prabhune
Gopala Anumanchipalli
25
1
0
22 Jun 2024
TacoLM: GaTed Attention Equipped Codec Language Model are Efficient Zero-Shot Text to Speech Synthesizers
Yakun Song
Zhuo Chen
Xiaofei Wang
Ziyang Ma
Guanrou Yang
Xie Chen
AuLLM
22
3
0
22 Jun 2024
Speech Emotion Recognition under Resource Constraints with Data Distillation
Yi Chang
Zhao Ren
Zhonghao Zhao
Thanh Tam Nguyen
Kun Qian
Tanja Schultz
Björn W. Schuller
14
0
0
21 Jun 2024
Accessible, At-Home Detection of Parkinson's Disease via Multi-task Video Analysis
Md. Saiful Islam
Tariq Adnan
Jan Freyberg
Sangwu Lee
Abdelrahman Abdelkader
...
Cathe Schwartz
Karen Jaffe
Ruth B. Schneider
E. R. Dorsey
Ehsan Hoque
68
0
0
21 Jun 2024
Voice Disorder Analysis: a Transformer-based Approach
Alkis Koudounas
Gabriele Ciravegna
M. Fantini
G. Succo
Erika Crosetti
Tania Cerquitelli
Elena Baralis
19
3
0
20 Jun 2024
DASB -- Discrete Audio and Speech Benchmark
Pooneh Mousavi
Luca Della Libera
J. Duret
Artem Ploujnikov
Cem Subakan
Mirco Ravanelli
22
12
0
20 Jun 2024
Seamless Language Expansion: Enhancing Multilingual Mastery in Self-Supervised Models
Jing Xu
Minglin Wu
Xixin Wu
Helen Meng
CLL
27
1
0
20 Jun 2024
Children's Speech Recognition through Discrete Token Enhancement
Vrunda N. Sukhadia
Shammur A. Chowdhury
35
1
0
19 Jun 2024
Previous
1
2
3
...
6
7
8
...
19
20
21
Next