ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2106.07447
  4. Cited By
HuBERT: Self-Supervised Speech Representation Learning by Masked
  Prediction of Hidden Units

HuBERT: Self-Supervised Speech Representation Learning by Masked Prediction of Hidden Units

14 June 2021
Wei-Ning Hsu
Benjamin Bolte
Yao-Hung Hubert Tsai
Kushal Lakhotia
Ruslan Salakhutdinov
Abdel-rahman Mohamed
    SSL
ArXivPDFHTML

Papers citing "HuBERT: Self-Supervised Speech Representation Learning by Masked Prediction of Hidden Units"

50 / 445 papers shown
Title
WhiSPA: Semantically and Psychologically Aligned Whisper with Self-Supervised Contrastive and Student-Teacher Learning
WhiSPA: Semantically and Psychologically Aligned Whisper with Self-Supervised Contrastive and Student-Teacher Learning
Rajath Rao
Adithya V Ganesan
O. Kjell
Jonah Luby
Akshay Raghavan
...
B. Luft
Camilo Ruggero
Neville Ryant
R. Kotov
H. A. Schwartz
32
0
0
15 Jan 2025
LUPET: Incorporating Hierarchical Information Path into Multilingual ASR
LUPET: Incorporating Hierarchical Information Path into Multilingual ASR
Wei Liu
Jingyong Hou
Dong Yang
Muyong Cao
Tan Lee
70
1
0
10 Jan 2025
HAAQI-Net: A Non-intrusive Neural Music Audio Quality Assessment Model for Hearing Aids
HAAQI-Net: A Non-intrusive Neural Music Audio Quality Assessment Model for Hearing Aids
Dyah A. M. G. Wisnu
Stefano Rini
Ryandhimas E. Zezario
Hsin-Min Wang
Yu Tsao
49
0
0
10 Jan 2025
Spectral-Aware Low-Rank Adaptation for Speaker Verification
Spectral-Aware Low-Rank Adaptation for Speaker Verification
Zhe Li
Man-Wai Mak
Mert Pilanci
Hung-yi Lee
H. Meng
41
0
0
07 Jan 2025
Bird Vocalization Embedding Extraction Using Self-Supervised Disentangled Representation Learning
Bird Vocalization Embedding Extraction Using Self-Supervised Disentangled Representation Learning
Runwu Shi
Katsutoshi Itoyama
K. Nakadai
SSL
DRL
39
1
0
31 Dec 2024
Deep Neural Networks and Brain Alignment: Brain Encoding and Decoding (Survey)
Deep Neural Networks and Brain Alignment: Brain Encoding and Decoding (Survey)
S. Oota
Zijiao Chen
Manish Gupta
R. Bapi
G. Jobard
F. Alexandre
X. Hinaut
3DV
AI4CE
44
11
0
31 Dec 2024
Metadata-Enhanced Speech Emotion Recognition: Augmented Residual Integration and Co-Attention in Two-Stage Fine-Tuning
Metadata-Enhanced Speech Emotion Recognition: Augmented Residual Integration and Co-Attention in Two-Stage Fine-Tuning
Zixiang Wan
Ziyue Qiu
Yiyang Liu
Wei-Qiang Zhang
26
0
0
31 Dec 2024
DiFiC: Your Diffusion Model Holds the Secret to Fine-Grained Clustering
DiFiC: Your Diffusion Model Holds the Secret to Fine-Grained Clustering
Ruohong Yang
Peng Hu
Xi Peng
Xiting Liu
Yunfan Li
34
0
0
25 Dec 2024
DCIS: Efficient Length Extrapolation of LLMs via Divide-and-Conquer
  Scaling Factor Search
DCIS: Efficient Length Extrapolation of LLMs via Divide-and-Conquer Scaling Factor Search
Lei Yang
Shaoyang Xu
Deyi Xiong
28
0
0
25 Dec 2024
SongEditor: Adapting Zero-Shot Song Generation Language Model as a Multi-Task Editor
SongEditor: Adapting Zero-Shot Song Generation Language Model as a Multi-Task Editor
Chenyu Yang
Shuai Wang
Hangting Chen
Jianwei Yu
Wei Tan
Rongzhi Gu
Y. Xu
Yizhi Zhou
Haina Zhu
H. Li
KELM
145
1
0
18 Dec 2024
autrainer: A Modular and Extensible Deep Learning Toolkit for Computer Audition Tasks
autrainer: A Modular and Extensible Deep Learning Toolkit for Computer Audition Tasks
Simon Rampp
Andreas Triantafyllopoulos
M. Milling
Björn Schuller
85
0
0
16 Dec 2024
How to Learn a New Language? An Efficient Solution for Self-Supervised Learning Models Unseen Languages Adaption in Low-Resource Scenario
How to Learn a New Language? An Efficient Solution for Self-Supervised Learning Models Unseen Languages Adaption in Low-Resource Scenario
Shih-Heng Wang
Zih-Ching Chen
Jiatong Shi
Ming To Chuang
Guan-Ting Lin
Kuan Po Huang
David F. Harwath
Shang-Wen Li
Hung-yi Lee
76
1
0
27 Nov 2024
EmoSphere++: Emotion-Controllable Zero-Shot Text-to-Speech via Emotion-Adaptive Spherical Vector
EmoSphere++: Emotion-Controllable Zero-Shot Text-to-Speech via Emotion-Adaptive Spherical Vector
Deok-Hyeon Cho
Hyung-Seok Oh
Seung-Bin Kim
Seong-Whan Lee
39
3
0
04 Nov 2024
Fast and High-Quality Auto-Regressive Speech Synthesis via Speculative Decoding
Fast and High-Quality Auto-Regressive Speech Synthesis via Speculative Decoding
Bohan Li
Hankun Wang
Situo Zhang
Yiwei Guo
Kai Yu
33
5
0
29 Oct 2024
VoiceTextBlender: Augmenting Large Language Models with Speech Capabilities via Single-Stage Joint Speech-Text Supervised Fine-Tuning
VoiceTextBlender: Augmenting Large Language Models with Speech Capabilities via Single-Stage Joint Speech-Text Supervised Fine-Tuning
Yifan Peng
Krishna C. Puvvada
Zhehuai Chen
Piotr .Zelasko
He Huang
Kunal Dhawan
Ke Hu
Shinji Watanabe
Jagadeesh Balam
Boris Ginsburg
54
2
0
23 Oct 2024
Improving Pronunciation and Accent Conversion through Knowledge Distillation And Synthetic Ground-Truth from Native TTS
Improving Pronunciation and Accent Conversion through Knowledge Distillation And Synthetic Ground-Truth from Native TTS
T. Nguyen
Seymanur Akti
Ngoc-Quan Pham
A. Waibel
21
0
0
19 Oct 2024
DAWN: Dynamic Frame Avatar with Non-autoregressive Diffusion Framework for Talking Head Video Generation
DAWN: Dynamic Frame Avatar with Non-autoregressive Diffusion Framework for Talking Head Video Generation
Hanbo Cheng
Limin Lin
Chenyu Liu
Pengcheng Xia
Pengfei Hu
Jiefeng Ma
Jun Du
Jia Pan
DiffM
VGen
101
0
0
17 Oct 2024
Beyond Fixed Topologies: Unregistered Training and Comprehensive Evaluation Metrics for 3D Talking Heads
Beyond Fixed Topologies: Unregistered Training and Comprehensive Evaluation Metrics for 3D Talking Heads
Federico Nocentini
T. Besnier
Claudio Ferrari
Sylvain Arguillere
Stefano Berretti
Mohamed Daoudi
59
1
0
14 Oct 2024
Improving Semantic Understanding in Speech Language Models via Brain-tuning
Improving Semantic Understanding in Speech Language Models via Brain-tuning
Omer Moussa
Dietrich Klakow
Mariya Toneva
37
3
0
11 Oct 2024
The First VoicePrivacy Attacker Challenge Evaluation Plan
The First VoicePrivacy Attacker Challenge Evaluation Plan
N. Tomashenko
Xiaoxiao Miao
Emmanuel Vincent
Junichi Yamagishi
108
2
0
09 Oct 2024
SCOREQ: Speech Quality Assessment with Contrastive Regression
SCOREQ: Speech Quality Assessment with Contrastive Regression
Alessandro Ragano
Jan Skoglund
Andrew Hines
38
6
0
09 Oct 2024
Sylber: Syllabic Embedding Representation of Speech from Raw Audio
Sylber: Syllabic Embedding Representation of Speech from Raw Audio
Cheol Jun Cho
Nicholas Lee
Akshat Gupta
Dhruv Agarwal
Ethan Chen
Alan W Black
Gopala K. Anumanchipalli
32
0
0
09 Oct 2024
Joint Fine-tuning and Conversion of Pretrained Speech and Language Models towards Linear Complexity
Joint Fine-tuning and Conversion of Pretrained Speech and Language Models towards Linear Complexity
Mutian He
Philip N. Garner
80
0
0
09 Oct 2024
EmoGene: Audio-Driven Emotional 3D Talking-Head Generation
EmoGene: Audio-Driven Emotional 3D Talking-Head Generation
Wenqing Wang
Yun Fu
VGen
74
0
0
07 Oct 2024
Audio-Agent: Leveraging LLMs For Audio Generation, Editing and Composition
Audio-Agent: Leveraging LLMs For Audio Generation, Editing and Composition
Zixuan Wang
Chi-Keung Tang
Chi-Keung Tang
DiffM
VGen
LLMAG
41
4
0
04 Oct 2024
Recent Advances in Speech Language Models: A Survey
Recent Advances in Speech Language Models: A Survey
Wenqian Cui
Dianzhi Yu
Xiaoqi Jiao
Ziqiao Meng
Guangyan Zhang
Qichao Wang
Yiwen Guo
Irwin King
AuLLM
59
14
0
01 Oct 2024
Learning Frame-Wise Emotion Intensity for Audio-Driven Talking-Head
  Generation
Learning Frame-Wise Emotion Intensity for Audio-Driven Talking-Head Generation
Jingyi Xu
Hieu Le
Zhixin Shu
Yang Wang
Yi-Hsuan Tsai
Dimitris Samaras
29
0
0
29 Sep 2024
Probing mental health information in speech foundation models
Probing mental health information in speech foundation models
Marc de Gennes
Adrien Lesage
Martin Denais
Xuan-Nga Cao
Simon Chang
Pierre Van Remoortere
Cyrille Dakhlia
Rachid Riad
16
0
0
27 Sep 2024
MIO: A Foundation Model on Multimodal Tokens
MIO: A Foundation Model on Multimodal Tokens
Zekun Wang
King Zhu
Chunpu Xu
Wangchunshu Zhou
Jiaheng Liu
...
Yuanxing Zhang
Ge Zhang
Ke Xu
Jie Fu
Wenhao Huang
MLLM
AuLLM
51
11
0
26 Sep 2024
EMOVA: Empowering Language Models to See, Hear and Speak with Vivid Emotions
EMOVA: Empowering Language Models to See, Hear and Speak with Vivid Emotions
Kai Chen
Yunhao Gou
Runhui Huang
Zhili Liu
Daxin Tan
...
Qun Liu
Jun Yao
Lu Hou
Hang Xu
Hang Xu
AuLLM
MLLM
VLM
67
21
0
26 Sep 2024
Semi-Supervised Cognitive State Classification from Speech with Multi-View Pseudo-Labeling
Semi-Supervised Cognitive State Classification from Speech with Multi-View Pseudo-Labeling
Yuanchao Li
Zixing Zhang
Jing Han
P. Bell
Catherine Lai
60
0
0
25 Sep 2024
MT2KD: Towards A General-Purpose Encoder for Speech, Speaker, and Audio Events
MT2KD: Towards A General-Purpose Encoder for Speech, Speaker, and Audio Events
Xiaoyu Yang
Qiujia Li
Chao Zhang
P. Woodland
18
0
0
25 Sep 2024
TCSinger: Zero-Shot Singing Voice Synthesis with Style Transfer and Multi-Level Style Control
TCSinger: Zero-Shot Singing Voice Synthesis with Style Transfer and Multi-Level Style Control
Yu Zhang
Ziyue Jiang
Ruiqi Li
Changhao Pan
Jinzheng He
Rongjie Huang
Chuxin Wang
Zhou Zhao
DiffM
VLM
41
4
0
24 Sep 2024
LlamaPartialSpoof: An LLM-Driven Fake Speech Dataset Simulating Disinformation Generation
LlamaPartialSpoof: An LLM-Driven Fake Speech Dataset Simulating Disinformation Generation
Hieu-Thi Luong
Haoyang Li
Lin Zhang
Kong Aik Lee
Eng Siong Chng
54
2
0
23 Sep 2024
A Comprehensive Survey with Critical Analysis for Deepfake Speech Detection
A Comprehensive Survey with Critical Analysis for Deepfake Speech Detection
Lam Pham
Phat Lam
Dat Tran
Hieu Tang
Tin Nguyen
Alexander Schindler
Canh Vu
Alexander Polonsky
Canh Vu
46
3
0
23 Sep 2024
Unsupervised Word Discovery: Boundary Detection with Clustering vs. Dynamic Programming
Unsupervised Word Discovery: Boundary Detection with Clustering vs. Dynamic Programming
Simon Malan
Benjamin van Niekerk
Herman Kamper
25
0
0
22 Sep 2024
SoundBeam meets M2D: Target Sound Extraction with Audio Foundation Model
SoundBeam meets M2D: Target Sound Extraction with Audio Foundation Model
Carlos Hernandez-Olivan
Marc Delcroix
Tsubasa Ochiai
Daisuke Niizumi
Naohiro Tawara
Tomohiro Nakatani
Shoko Araki
34
2
0
19 Sep 2024
LLMs in Education: Novel Perspectives, Challenges, and Opportunities
LLMs in Education: Novel Perspectives, Challenges, and Opportunities
Bashar Alhafni
Sowmya Vajjala
Stefano Banno
Kaushal Kumar Maurya
Ekaterina Kochmar
AI4Ed
35
1
0
18 Sep 2024
Large Language Models are Strong Audio-Visual Speech Recognition Learners
Large Language Models are Strong Audio-Visual Speech Recognition Learners
Umberto Cappellazzo
Minsu Kim
Honglie Chen
Pingchuan Ma
Stavros Petridis
Daniele Falavigna
Alessio Brutti
Maja Pantic
31
9
0
18 Sep 2024
Discrete Unit based Masking for Improving Disentanglement in Voice
  Conversion
Discrete Unit based Masking for Improving Disentanglement in Voice Conversion
Philip H. Lee
Ismail Rasim Ulgen
Berrak Sisman
23
0
0
17 Sep 2024
M-BEST-RQ: A Multi-Channel Speech Foundation Model for Smart Glasses
M-BEST-RQ: A Multi-Channel Speech Foundation Model for Smart Glasses
Yufeng Yang
Desh Raj
Ju Lin
Niko Moritz
J. Jia
...
Egor Lakomkin
Yiteng Huang
Jacob Donley
Jay Mahadeokar
Ozlem Kalinli
19
2
0
17 Sep 2024
3DFacePolicy: Speech-Driven 3D Facial Animation with Diffusion Policy
3DFacePolicy: Speech-Driven 3D Facial Animation with Diffusion Policy
Xuanmeng Sha
Liyun Zhang
Tomohiro Mashita
Yuki Uranishi
VGen
25
0
0
17 Sep 2024
LC-Protonets: Multi-Label Few-Shot Learning for World Music Audio Tagging
LC-Protonets: Multi-Label Few-Shot Learning for World Music Audio Tagging
Charilaos Papaioannou
Emmanouil Benetos
Alexandros Potamianos
20
0
0
17 Sep 2024
Speaker-IPL: Unsupervised Learning of Speaker Characteristics with i-Vector based Pseudo-Labels
Speaker-IPL: Unsupervised Learning of Speaker Characteristics with i-Vector based Pseudo-Labels
Zakaria Aldeneh
Takuya Higuchi
Jee-weon Jung
Li-Wei Chen
Stephen Shum
Ahmed Hussen Abdelaziz
Shinji Watanabe
Tatiana Likhomanenko
B. Theobald
VLM
SSL
42
0
0
16 Sep 2024
Exploring Prediction Targets in Masked Pre-Training for Speech Foundation Models
Exploring Prediction Targets in Masked Pre-Training for Speech Foundation Models
Li-Wei Chen
Takuya Higuchi
He Bai
Ahmed Hussen Abdelaziz
Alexander Rudnicky
Shinji Watanabe
Tatiana Likhomanenko
B. Theobald
Zakaria Aldeneh
38
0
0
16 Sep 2024
Towards Automatic Assessment of Self-Supervised Speech Models using Rank
Towards Automatic Assessment of Self-Supervised Speech Models using Rank
Zakaria Aldeneh
Vimal Thilak
Takuya Higuchi
B. Theobald
Tatiana Likhomanenko
SSL
67
0
0
16 Sep 2024
A Simple HMM with Self-Supervised Representations for Phone Segmentation
A Simple HMM with Self-Supervised Representations for Phone Segmentation
Gene-Ping Yang
Hao Tang
SSL
33
0
0
15 Sep 2024
Self-supervised Learning for Acoustic Few-Shot Classification
Self-supervised Learning for Acoustic Few-Shot Classification
Jingyong Liang
Bernd Meyer
Issac Ning Lee
Thanh-Toan Do
SSL
52
0
0
15 Sep 2024
MacST: Multi-Accent Speech Synthesis via Text Transliteration for Accent Conversion
MacST: Multi-Accent Speech Synthesis via Text Transliteration for Accent Conversion
Sho Inoue
Shuai Wang
Wanxing Wang
Pengcheng Zhu
Mengxiao Bi
Haizhou Li
27
1
0
14 Sep 2024
Universal Pooling Method of Multi-layer Features from Pretrained Models
  for Speaker Verification
Universal Pooling Method of Multi-layer Features from Pretrained Models for Speaker Verification
Jin Sob Kim
Hyun Joon Park
Wooseok Shin
Sung Won Han
SLR
48
0
0
12 Sep 2024
Previous
123456789
Next