Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2106.07447
Cited By
HuBERT: Self-Supervised Speech Representation Learning by Masked Prediction of Hidden Units
14 June 2021
Wei-Ning Hsu
Benjamin Bolte
Yao-Hung Hubert Tsai
Kushal Lakhotia
Ruslan Salakhutdinov
Abdel-rahman Mohamed
SSL
Re-assign community
ArXiv
PDF
HTML
Papers citing
"HuBERT: Self-Supervised Speech Representation Learning by Masked Prediction of Hidden Units"
50 / 430 papers shown
Title
Unveiling the Best Practices for Applying Speech Foundation Models to Speech Intelligibility Prediction for Hearing-Impaired People
Haoshuai Zhou
Boxuan Cao
Changgeng Mo
Linkai Li
Shan Xiang Wang
AI4CE
29
0
0
13 May 2025
Multi-band Frequency Reconstruction for Neural Psychoacoustic Coding
Dianwen Ng
Kun Zhou
Yi-Wen Chao
Zhiwei Xiong
B. Ma
E. Chng
28
0
0
12 May 2025
TS-SUPERB: A Target Speech Processing Benchmark for Speech Self-Supervised Learning Models
Junyi Peng
Takanori Ashihara
Marc Delcroix
Tsubasa Ochiai
Oldrich Plchot
Shoko Araki
J. Černocký
ELM
21
0
0
10 May 2025
Towards a Unified Representation Evaluation Framework Beyond Downstream Tasks
Christos Plachouras
Julien Guinot
George Fazekas
Elio Quinton
Emmanouil Benetos
Johan Pauwels
95
1
0
09 May 2025
Teochew-Wild: The First In-the-wild Teochew Dataset with Orthographic Annotations
Linrong Pan
Chenglong Jiang
Gaoze Hou
Ying Gao
43
0
0
08 May 2025
Discrete Optimal Transport and Voice Conversion
Anton Selitskiy
Maitreya Kocharekar
OT
72
0
0
07 May 2025
Miipher-2: A Universal Speech Restoration Model for Million-Hour Scale Data Restoration
Shigeki Karita
Yuma Koizumi
Heiga Zen
Haruko Ishikawa
Robin Scheibler
M. Bacchiani
VLM
130
1
0
07 May 2025
Domain Adversarial Training for Mitigating Gender Bias in Speech-based Mental Health Detection
June-Woo Kim
Haram Yoon
Wonkyo Oh
Dawoon Jung
Sung-Hoon Yoon
Dae-Jin Kim
Dong-Ho Lee
Sang-Yeol Lee
Chan-Mo Yang
36
0
0
06 May 2025
fastabx: A library for efficient computation of ABX discriminability
Maxime Poli
Emmanuel Chemla
Emmanuel Dupoux
34
0
0
05 May 2025
VAEmo: Efficient Representation Learning for Visual-Audio Emotion with Knowledge Injection
Hao Cheng
Zhiwei Zhao
Yichao He
Zhenzhen Hu
Jia Li
M. Wang
Richang Hong
41
0
0
05 May 2025
Voila: Voice-Language Foundation Models for Real-Time Autonomous Interaction and Voice Role-Play
Yemin Shi
Yu Shu
Siwei Dong
Guangyi Liu
Jaward Sesay
Jingwen Li
Zhiting Hu
AuLLM
VLM
45
0
0
05 May 2025
Language translation, and change of accent for speech-to-speech task using diffusion model
Abhishek Mishra
Ritesh Sur Chowdhury
Vartul Bahuguna
Isha Pandey
Ganesh Ramakrishnan
DiffM
44
0
0
04 May 2025
Co
3
^{3}
3
Gesture: Towards Coherent Concurrent Co-speech 3D Gesture Generation with Interactive Diffusion
Xingqun Qi
Yatian Wang
Hengyuan Zhang
J. Pan
Wei Xue
Shanghang Zhang
Wenhan Luo
Qifeng Liu
Yike Guo
SLR
57
0
0
03 May 2025
CAV-MAE Sync: Improving Contrastive Audio-Visual Mask Autoencoders via Fine-Grained Alignment
Edson Araujo
Andrew Rouditchenko
Yuan Gong
Saurabhchand Bhati
Samuel Thomas
Brian Kingsbury
Leonid Karlinsky
Rogerio Feris
James Glass
32
0
0
02 May 2025
Model See Model Do: Speech-Driven Facial Animation with Style Control
Yifang Pan
Karan Singh
Luiz Gustavo Hafemann
DiffM
50
0
0
02 May 2025
SpectrumFM: A Foundation Model for Intelligent Spectrum Management
F. Zhou
Chunyu Liu
Hao Zhang
W. Wu
Qihui Wu
Derrick Wing Kwan Ng
Tony Q. S. Quek
Chan-Byoung Chae
24
0
0
02 May 2025
KeySync: A Robust Approach for Leakage-free Lip Synchronization in High Resolution
Antoni Bigata
Rodrigo Mira
Stella Bounareli
Michał Stypułkowski
Konstantinos Vougioukas
Stavros Petridis
Maja Pantic
52
0
0
01 May 2025
Multimodal Large Language Models for Medicine: A Comprehensive Survey
Jiarui Ye
Hao Tang
LM&MA
84
0
0
29 Apr 2025
APG-MOS: Auditory Perception Guided-MOS Predictor for Synthetic Speech
Zhicheng Lian
Lizhi Wang
Hua Huang
49
0
0
29 Apr 2025
Muyan-TTS: A Trainable Text-to-Speech Model Optimized for Podcast Scenarios with a
50
K
B
u
d
g
e
t
50K Budget
50
K
B
u
d
g
e
t
Xin Li
Kaikai Jia
Hao Sun
Jun Dai
Z. L. Jiang
102
0
0
27 Apr 2025
Circinus: Efficient Query Planner for Compound ML Serving
Banruo Liu
Wei-Yu Lin
Minghao Fang
Yihan Jiang
Fan Lai
LRM
34
0
0
23 Apr 2025
Quantifying Source Speaker Leakage in One-to-One Voice Conversion
Scott Wellington
Xuechen Liu
Junichi Yamagishi
28
0
0
22 Apr 2025
StableQuant: Layer Adaptive Post-Training Quantization for Speech Foundation Models
Yeona Hong
Hyewon Han
Woo-Jin Chung
Hong-Goo Kang
MQ
28
0
0
21 Apr 2025
EchoMask: Speech-Queried Attention-based Mask Modeling for Holistic Co-Speech Motion Generation
Xiangyue Zhang
Jianfang Li
Jiaxu Zhang
Jianqiang Ren
Liefeng Bo
Zhigang Tu
25
0
0
12 Apr 2025
Mitigating Timbre Leakage with Universal Semantic Mapping Residual Block for Voice Conversion
Na Li
Chuke Wang
Yu Gu
Zhifeng Li
54
0
0
11 Apr 2025
kNN-SVC: Robust Zero-Shot Singing Voice Conversion with Additive Synthesis and Concatenation Smoothness Optimization
Keren Shao
K. Chen
Matthew Baas
Shlomo Dubnov
20
0
0
08 Apr 2025
Exploring Local Interpretable Model-Agnostic Explanations for Speech Emotion Recognition with Distribution-Shift
Maja J. Hjuler
Line H. Clemmensen
Sneha Das
FAtt
44
0
0
07 Apr 2025
MAVEN: Multi-modal Attention for Valence-Arousal Emotion Network
Vrushank Ahire
Kunal Shah
Mudasir Nazir Khan
Nikhil Pakhale
L. Sookha
M. A. Ganaie
Abhinav Dhall
65
0
0
16 Mar 2025
Universal Speech Token Learning via Low-Bitrate Neural Codec and Pretrained Representations
Xue Jiang
Xiulian Peng
Yuan Zhang
Yan-Heng Lu
SSL
83
0
0
15 Mar 2025
FREAK: Frequency-modulated High-fidelity and Real-time Audio-driven Talking Portrait Synthesis
Ziqi Ni
Ao Fu
Yi Zhou
61
0
0
06 Mar 2025
Self-Supervised Models for Phoneme Recognition: Applications in Children's Speech for Reading Learning
Lucas Block Medin
Thomas Pellegrini
Lucile Gelin
SSL
61
1
0
06 Mar 2025
ARTalk: Speech-Driven 3D Head Animation via Autoregressive Model
Xuangeng Chu
Nabarun Goswami
Ziteng Cui
Hanqin Wang
Tatsuya Harada
DiffM
71
0
0
27 Feb 2025
Retrieval-Augmented Speech Recognition Approach for Domain Challenges
Peng Shen
Xugang Lu
Hisashi Kawai
RALM
60
0
0
24 Feb 2025
TimePFN: Effective Multivariate Time Series Forecasting with Synthetic Data
Ege Onur Taga
M. E. Ildiz
Samet Oymak
AI4TS
50
2
0
22 Feb 2025
A Survey on Bridging EEG Signals and Generative AI: From Image and Text to Beyond
Shreya Shukla
Jose Torres
Abhijit Mishra
Jacek Gwizdka
Shounak Roychowdhury
43
0
0
20 Feb 2025
A Dual-Stage Time-Context Network for Speech-Based Alzheimer's Disease Detection
Yifan Gao
Long Guo
Hong Liu
87
0
0
18 Feb 2025
CR-CTC: Consistency regularization on CTC for improved speech recognition
Zengwei Yao
Wei Kang
Xiaoyu Yang
Fangjun Kuang
Liyong Guo
Han Zhu
Zengrui Jin
Zhaoqing Li
Long Lin
Daniel Povey
51
0
0
17 Feb 2025
Less is More for Synthetic Speech Detection in the Wild
Ashi Garg
Zexin Cai
Henry Li Xinyuan
Leibny Paola García-Perera
Kevin Duh
Sanjeev Khudanpur
Matthew Wiesner
Nicholas Andrews
74
0
0
17 Feb 2025
The Role of Prosody in Spoken Question Answering
Jie Chi
Maureen de Seyssel
Natalie Schluter
49
0
0
08 Feb 2025
Leveraging Broadcast Media Subtitle Transcripts for Automatic Speech Recognition and Subtitling
Jakob Poncelet
Hugo Van hamme
69
0
0
05 Feb 2025
Comprehensive Layer-wise Analysis of SSL Models for Audio Deepfake Detection
Yassine El Kheir
Youness Samih
Suraj Maharjan
Tim Polzehl
Sebastian Möller
67
1
0
05 Feb 2025
A Unit-based System and Dataset for Expressive Direct Speech-to-Speech Translation
Anna Min
Chenxu Hu
Yi Ren
Hang Zhao
61
0
0
01 Feb 2025
FinchGPT: a Transformer based language model for birdsong analysis
Kosei Kobayashi
Kosuke Matsuzaki
Masaya Taniguchi
Keisuke Sakaguchi
Kentaro Inui
Kentaro Abe
68
0
0
01 Feb 2025
Speech Translation Refinement using Large Language Models
Huaixia Dou
Xinyu Tian
Xinglin Lyu
Jie Zhu
Junhui Li
Lifan Guo
116
0
0
28 Jan 2025
Optimized Self-supervised Training with BEST-RQ for Speech Recognition
Ilja Baumann
Dominik Wagner
K. Riedhammer
Tobias Bocklet
69
0
0
28 Jan 2025
Multi-Task Corrupted Prediction for Learning Robust Audio-Visual Speech Representation
Sungnyun Kim
Sungwoo Cho
Sangmin Bae
Kangwook Jang
Se-Young Yun
SSL
68
1
0
23 Jan 2025
Sublinear Variational Optimization of Gaussian Mixture Models with Millions to Billions of Parameters
Sebastian Salwig
Till Kahlke
F. Hirschberger
D. Forster
Jorg Lucke
VLM
84
0
0
21 Jan 2025
Noise-Agnostic Multitask Whisper Training for Reducing False Alarm Errors in Call-for-Help Detection
Myeonghoon Ryu
June-Woo Kim
Minseok Oh
Suji Lee
Han Park
36
0
0
20 Jan 2025
A Comprehensive Survey of Foundation Models in Medicine
Wasif Khan
Seowung Leem
Kyle B. See
Joshua K. Wong
Shaoting Zhang
R. Fang
AI4CE
LM&MA
VLM
97
18
0
17 Jan 2025
USED: Universal Speaker Extraction and Diarization
Junyi Ao
Mehmet Sinan Yildirim
Ruijie Tao
Mengyao Ge
Shuai Wang
Yan-min Qian
Haizhou Li
33
5
0
17 Jan 2025
1
2
3
4
5
6
7
8
9
Next