ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2106.07447
  4. Cited By
HuBERT: Self-Supervised Speech Representation Learning by Masked
  Prediction of Hidden Units

HuBERT: Self-Supervised Speech Representation Learning by Masked Prediction of Hidden Units

14 June 2021
Wei-Ning Hsu
Benjamin Bolte
Yao-Hung Hubert Tsai
Kushal Lakhotia
Ruslan Salakhutdinov
Abdel-rahman Mohamed
    SSL
ArXivPDFHTML

Papers citing "HuBERT: Self-Supervised Speech Representation Learning by Masked Prediction of Hidden Units"

50 / 445 papers shown
Title
Human Transcription Quality Improvement
Human Transcription Quality Improvement
Jian Gao
Hanbo Sun
Cheng Cao
Zheng Du
27
2
0
24 Sep 2023
Audio Contrastive based Fine-tuning
Audio Contrastive based Fine-tuning
Yang Wang
Qibin Liang
Chenghao Xiao
Yizhi Li
Noura Al Moubayed
Chenghua Lin
24
0
0
21 Sep 2023
Towards Joint Modeling of Dialogue Response and Speech Synthesis based
  on Large Language Model
Towards Joint Modeling of Dialogue Response and Speech Synthesis based on Large Language Model
Xinyu Zhou
Delong Chen
Yudong Chen
AuLLM
27
0
0
20 Sep 2023
Test-Time Training for Speech
Test-Time Training for Speech
Sri Harsha Dumpala
Chandramouli Shama Sastry
Sageev Oore
35
1
0
19 Sep 2023
Discrete Audio Representation as an Alternative to Mel-Spectrograms for
  Speaker and Speech Recognition
Discrete Audio Representation as an Alternative to Mel-Spectrograms for Speaker and Speech Recognition
Krishna C. Puvvada
Nithin Rao Koluguri
Kunal Dhawan
Jagadeesh Balam
Boris Ginsburg
24
12
0
19 Sep 2023
Multimodal Modeling For Spoken Language Identification
Multimodal Modeling For Spoken Language Identification
Shikhar Bharadwaj
Min Ma
Shikhar Vashishth
Ankur Bapna
Sriram Ganapathy
...
Yu Zhang
D. Esch
Sandy Ritchie
Partha P. Talukdar
Jason Riesa
30
0
0
19 Sep 2023
Non-Intrusive Speech Intelligibility Prediction for Hearing Aids using
  Whisper and Metadata
Non-Intrusive Speech Intelligibility Prediction for Hearing Aids using Whisper and Metadata
Ryandhimas E. Zezario
Fei Chen
C. Fuh
H. Wang
Yu Tsao
37
1
0
18 Sep 2023
FunCodec: A Fundamental, Reproducible and Integrable Open-source Toolkit
  for Neural Speech Codec
FunCodec: A Fundamental, Reproducible and Integrable Open-source Toolkit for Neural Speech Codec
Zhihao Du
Shiliang Zhang
Kai Hu
Siqi Zheng
24
54
0
14 Sep 2023
Highly Controllable Diffusion-based Any-to-Any Voice Conversion Model
  with Frame-level Prosody Feature
Highly Controllable Diffusion-based Any-to-Any Voice Conversion Model with Frame-level Prosody Feature
Kyungguen Byun
Sunkuk Moon
Erik Visser
DiffM
16
0
0
06 Sep 2023
Stylebook: Content-Dependent Speaking Style Modeling for Any-to-Any
  Voice Conversion using Only Speech Data
Stylebook: Content-Dependent Speaking Style Modeling for Any-to-Any Voice Conversion using Only Speech Data
Hyungseob Lim
Kyungguen Byun
Sunkuk Moon
Erik Visser
DiffM
24
2
0
06 Sep 2023
Leveraging Label Information for Multimodal Emotion Recognition
Leveraging Label Information for Multimodal Emotion Recognition
Pei-Hsin Wang
Sunlu Zeng
Junqing Chen
Lu Fan
Meng Chen
Youzheng Wu
Xiaodong He
27
4
0
05 Sep 2023
Learning Speech Representation From Contrastive Token-Acoustic
  Pretraining
Learning Speech Representation From Contrastive Token-Acoustic Pretraining
Chunyu Qiang
Hao Li
Yixin Tian
Ruibo Fu
Tao Wang
Longbiao Wang
J. Dang
21
5
0
01 Sep 2023
Improving Small Footprint Few-shot Keyword Spotting with Supervision on
  Auxiliary Data
Improving Small Footprint Few-shot Keyword Spotting with Supervision on Auxiliary Data
Seunghan Yang
Byeonggeun Kim
Kyuhong Shim
Simyoung Chang
24
1
0
31 Aug 2023
RAMP: Retrieval-Augmented MOS Prediction via Confidence-based Dynamic
  Weighting
RAMP: Retrieval-Augmented MOS Prediction via Confidence-based Dynamic Weighting
Haibo Wang
Shiwan Zhao
Xiguang Zheng
Yong Qin
21
11
0
31 Aug 2023
Unsupervised Active Learning: Optimizing Labeling Cost-Effectiveness for
  Automatic Speech Recognition
Unsupervised Active Learning: Optimizing Labeling Cost-Effectiveness for Automatic Speech Recognition
Zhisheng Zheng
Ziyang Ma
Yu Wang
Xie Chen
26
2
0
28 Aug 2023
An Effective Transformer-based Contextual Model and Temporal Gate
  Pooling for Speaker Identification
An Effective Transformer-based Contextual Model and Temporal Gate Pooling for Speaker Identification
Harunori Kawano
Sota Shimizu
30
1
0
22 Aug 2023
Lip Reading for Low-resource Languages by Learning and Combining General
  Speech Knowledge and Language-specific Knowledge
Lip Reading for Low-resource Languages by Learning and Combining General Speech Knowledge and Language-specific Knowledge
Minsu Kim
Jeong Hun Yeo
J. Choi
Y. Ro
34
16
0
18 Aug 2023
AudioFormer: Audio Transformer learns audio feature representations from discrete acoustic codes
Zhaohui Li
Haitao Wang
Xinghua Jiang
31
1
0
14 Aug 2023
AudioLDM 2: Learning Holistic Audio Generation with Self-supervised
  Pretraining
AudioLDM 2: Learning Holistic Audio Generation with Self-supervised Pretraining
Haohe Liu
Yiitan Yuan
Xubo Liu
Xinhao Mei
Qiuqiang Kong
Qiao Tian
Yuping Wang
Wenwu Wang
Yuxuan Wang
Mark D. Plumbley
DiffM
25
221
0
10 Aug 2023
Elucidate Gender Fairness in Singing Voice Transcription
Elucidate Gender Fairness in Singing Voice Transcription
Xiangming Gu
Weizhen Zeng
Ye Wang
10
3
0
05 Aug 2023
UniBriVL: Robust Universal Representation and Generation of Audio Driven
  Diffusion Models
UniBriVL: Robust Universal Representation and Generation of Audio Driven Diffusion Models
Sen Fang
Bowen Gao
Yangjian Wu
T. Teoh
DiffM
18
1
0
29 Jul 2023
Minimally-Supervised Speech Synthesis with Conditional Diffusion Model
  and Language Model: A Comparative Study of Semantic Coding
Minimally-Supervised Speech Synthesis with Conditional Diffusion Model and Language Model: A Comparative Study of Semantic Coding
Chunyu Qiang
Hao Li
Hao Ni
He Qu
Ruibo Fu
Tao Wang
Longbiao Wang
J. Dang
DiffM
30
8
0
28 Jul 2023
Vesper: A Compact and Effective Pretrained Model for Speech Emotion
  Recognition
Vesper: A Compact and Effective Pretrained Model for Speech Emotion Recognition
Weidong Chen
Xiaofen Xing
Peihao Chen
Xiangmin Xu
VLM
28
35
0
20 Jul 2023
On-Device Constrained Self-Supervised Speech Representation Learning for
  Keyword Spotting via Knowledge Distillation
On-Device Constrained Self-Supervised Speech Representation Learning for Keyword Spotting via Knowledge Distillation
Gene-Ping Yang
Yue Gu
Qingming Tang
Dongsu Du
Yuzong Liu
14
5
0
06 Jul 2023
Understanding Contrastive Learning Through the Lens of Margins
Understanding Contrastive Learning Through the Lens of Margins
Daniel Rho
Taesoo Kim
Sooill Park
Jaehyun Park
Jaehan Park
SSL
25
2
0
20 Jun 2023
Multi-pass Training and Cross-information Fusion for Low-resource
  End-to-end Accented Speech Recognition
Multi-pass Training and Cross-information Fusion for Low-resource End-to-end Accented Speech Recognition
Xuefei Wang
Yanhua Long
Yijie Li
Haoran Wei
25
4
0
20 Jun 2023
When to Use Efficient Self Attention? Profiling Text, Speech and Image
  Transformer Variants
When to Use Efficient Self Attention? Profiling Text, Speech and Image Transformer Variants
Anuj Diwan
Eunsol Choi
David F. Harwath
39
0
0
14 Jun 2023
MFSN: Multi-perspective Fusion Search Network For Pre-training Knowledge
  in Speech Emotion Recognition
MFSN: Multi-perspective Fusion Search Network For Pre-training Knowledge in Speech Emotion Recognition
Haiyang Sun
Fulin Zhang
Yingying Gao
Zheng Lian
Shilei Zhang
Junlan Feng
19
4
0
12 Jun 2023
Simultaneous or Sequential Training? How Speech Representations
  Cooperate in a Multi-Task Self-Supervised Learning System
Simultaneous or Sequential Training? How Speech Representations Cooperate in a Multi-Task Self-Supervised Learning System
Khazar Khorrami
María Andrea Cruz Blandón
Tuomas Virtanen
Okko Rasanen
SSL
20
1
0
05 Jun 2023
On the Robustness of Arabic Speech Dialect Identification
On the Robustness of Arabic Speech Dialect Identification
Peter Sullivan
AbdelRahim Elmadany
Muhammad Abdul-Mageed
15
8
0
01 Jun 2023
Some voices are too common: Building fair speech recognition systems
  using the Common Voice dataset
Some voices are too common: Building fair speech recognition systems using the Common Voice dataset
Lucas Maison
Yannick Esteve
24
3
0
01 Jun 2023
Multi-Dataset Co-Training with Sharpness-Aware Optimization for Audio
  Anti-spoofing
Multi-Dataset Co-Training with Sharpness-Aware Optimization for Audio Anti-spoofing
Hye-jin Shim
Jee-weon Jung
Tomi Kinnunen
19
13
0
31 May 2023
MiniSUPERB: Lightweight Benchmark for Self-supervised Speech Models
MiniSUPERB: Lightweight Benchmark for Self-supervised Speech Models
Yu-Hsiang Wang
Huan Chen
Kai-Wei Chang
Winston H. Hsu
Hung-yi Lee
16
6
0
30 May 2023
A Hierarchical Context-aware Modeling Approach for Multi-aspect and
  Multi-granular Pronunciation Assessment
A Hierarchical Context-aware Modeling Approach for Multi-aspect and Multi-granular Pronunciation Assessment
Fu-An Chao
Tien-Hong Lo
Tzu-I Wu
Yao-Ting Sung
Berlin Chen
21
7
0
29 May 2023
Improving Textless Spoken Language Understanding with Discrete Units as
  Intermediate Target
Improving Textless Spoken Language Understanding with Discrete Units as Intermediate Target
Guanyong Wu
Guan-Ting Lin
Shang-Wen Li
Hung-yi Lee
18
5
0
29 May 2023
DeCoR: Defy Knowledge Forgetting by Predicting Earlier Audio Codes
DeCoR: Defy Knowledge Forgetting by Predicting Earlier Audio Codes
Xilin Jiang
Yinghao Aaron Li
N. Mesgarani
CLL
19
1
0
29 May 2023
Investigating Pre-trained Audio Encoders in the Low-Resource Condition
Investigating Pre-trained Audio Encoders in the Low-Resource Condition
Haomiao Yang
Jinming Zhao
Gholamreza Haffari
Ehsan Shareghi
14
6
0
28 May 2023
Inter-connection: Effective Connection between Pre-trained Encoder and
  Decoder for Speech Translation
Inter-connection: Effective Connection between Pre-trained Encoder and Decoder for Speech Translation
Yuta Nishikawa
Satoshi Nakamura
30
4
0
26 May 2023
Visually grounded few-shot word acquisition with fewer shots
Visually grounded few-shot word acquisition with fewer shots
Leanne Nortje
Benjamin van Niekerk
Herman Kamper
16
1
0
25 May 2023
AV-TranSpeech: Audio-Visual Robust Speech-to-Speech Translation
AV-TranSpeech: Audio-Visual Robust Speech-to-Speech Translation
Rongjie Huang
Huadai Liu
Xize Cheng
Yi Ren
Lin Li
...
Jinzheng He
Lichao Zhang
Jinglin Liu
Xiaoyue Yin
Zhou Zhao
67
8
0
24 May 2023
Detecting Check-Worthy Claims in Political Debates, Speeches, and
  Interviews Using Audio Data
Detecting Check-Worthy Claims in Political Debates, Speeches, and Interviews Using Audio Data
Petar Ivanov
Ivan Koychev
Momchil Hardalov
Preslav Nakov
19
4
0
24 May 2023
Difference-Masking: Choosing What to Mask in Continued Pretraining
Difference-Masking: Choosing What to Mask in Continued Pretraining
Alex Wilf
Syeda Nahida Akter
Leena Mathur
Paul Pu Liang
Sheryl Mathew
Mengrou Shou
Eric Nyberg
Louis-Philippe Morency
CLL
SSL
24
4
0
23 May 2023
ViT-TTS: Visual Text-to-Speech with Scalable Diffusion Transformer
ViT-TTS: Visual Text-to-Speech with Scalable Diffusion Transformer
Huadai Liu
Rongjie Huang
Xuan Lin
Wenqiang Xu
Maozong Zheng
Hong Chen
Jinzheng He
Zhou Zhao
DiffM
26
20
0
22 May 2023
Self-supervised Predictive Coding Models Encode Speaker and Phonetic
  Information in Orthogonal Subspaces
Self-supervised Predictive Coding Models Encode Speaker and Phonetic Information in Orthogonal Subspaces
Oli Danyi Liu
Hao Tang
Sharon Goldwater
SSL
25
12
0
21 May 2023
JNV Corpus: A Corpus of Japanese Nonverbal Vocalizations with Diverse
  Phrases and Emotions
JNV Corpus: A Corpus of Japanese Nonverbal Vocalizations with Diverse Phrases and Emotions
Detai Xin
Shinnosuke Takamichi
Hiroshi Saruwatari
12
5
0
21 May 2023
Self-supervised representations in speech-based depression detection
Self-supervised representations in speech-based depression detection
Wen Wu
C. Zhang
P. Woodland
14
23
0
20 May 2023
Scaling laws for language encoding models in fMRI
Scaling laws for language encoding models in fMRI
Richard Antonello
Aditya R. Vaidya
Alexander G. Huth
MedIm
22
55
0
19 May 2023
Pengi: An Audio Language Model for Audio Tasks
Pengi: An Audio Language Model for Audio Tasks
Soham Deshmukh
Benjamin Elizalde
Rita Singh
Huaming Wang
MLLM
AuLLM
30
157
0
19 May 2023
Recycle-and-Distill: Universal Compression Strategy for
  Transformer-based Speech SSL Models with Attention Map Reusing and Masking
  Distillation
Recycle-and-Distill: Universal Compression Strategy for Transformer-based Speech SSL Models with Attention Map Reusing and Masking Distillation
Kangwook Jang
Sungnyun Kim
Se-Young Yun
Hoi-Rim Kim
24
5
0
19 May 2023
MParrotTTS: Multilingual Multi-speaker Text to Speech Synthesis in Low
  Resource Setting
MParrotTTS: Multilingual Multi-speaker Text to Speech Synthesis in Low Resource Setting
Neil Shah
Vishal Tambrahalli
Saiteja Kosgi
N. Pedanekar
Vineet Gandhi
33
0
0
19 May 2023
Previous
123456789
Next