ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2106.07447
  4. Cited By
HuBERT: Self-Supervised Speech Representation Learning by Masked
  Prediction of Hidden Units

HuBERT: Self-Supervised Speech Representation Learning by Masked Prediction of Hidden Units

14 June 2021
Wei-Ning Hsu
Benjamin Bolte
Yao-Hung Hubert Tsai
Kushal Lakhotia
Ruslan Salakhutdinov
Abdel-rahman Mohamed
    SSL
ArXivPDFHTML

Papers citing "HuBERT: Self-Supervised Speech Representation Learning by Masked Prediction of Hidden Units"

50 / 451 papers shown
Title
Model Extraction Attack against Self-supervised Speech Models
Model Extraction Attack against Self-supervised Speech Models
Tsung-Yuan Hsu
Chen An Li
Tung-Yu Wu
Hung-yi Lee
19
1
0
29 Nov 2022
TESSP: Text-Enhanced Self-Supervised Speech Pre-training
TESSP: Text-Enhanced Self-Supervised Speech Pre-training
Zhuoyuan Yao
Shuo Ren
Sanyuan Chen
Ziyang Ma
Pengcheng Guo
Linfu Xie
22
5
0
24 Nov 2022
Device Directedness with Contextual Cues for Spoken Dialog Systems
Device Directedness with Contextual Cues for Spoken Dialog Systems
Dhanush Bekal
S. Srinivasan
S. Bodapati
S. Ronanki
Katrin Kirchhoff
31
1
0
23 Nov 2022
Exploring WavLM on Speech Enhancement
Exploring WavLM on Speech Enhancement
Hyungchan Song
Sanyuan Chen
Zhuo Chen
Yu-Huan Wu
Takuya Yoshioka
M. Tang
Jong Won Shin
Shujie Liu
8
16
0
18 Nov 2022
Compressing Transformer-based self-supervised models for speech
  processing
Compressing Transformer-based self-supervised models for speech processing
Tzu-Quan Lin
Tsung-Huan Yang
Chun-Yao Chang
Kuang-Ming Chen
Tzu-hsun Feng
Hung-yi Lee
Hao Tang
32
6
0
17 Nov 2022
MelHuBERT: A simplified HuBERT on Mel spectrograms
MelHuBERT: A simplified HuBERT on Mel spectrograms
Tzu-Quan Lin
Hung-yi Lee
Hao Tang
SSL
24
13
0
17 Nov 2022
NANSY++: Unified Voice Synthesis with Neural Analysis and Synthesis
NANSY++: Unified Voice Synthesis with Neural Analysis and Synthesis
Hyeong-Seok Choi
Jinhyeok Yang
Juheon Lee
Hyeongju Kim
16
46
0
17 Nov 2022
A unified one-shot prosody and speaker conversion system with
  self-supervised discrete speech units
A unified one-shot prosody and speaker conversion system with self-supervised discrete speech units
Li-Wei Chen
Shinji Watanabe
Alexander I. Rudnicky
18
6
0
12 Nov 2022
Speech separation with large-scale self-supervised learning
Speech separation with large-scale self-supervised learning
Zhuo Chen
Naoyuki Kanda
Jian Wu
Yu-Huan Wu
Xiaofei Wang
Takuya Yoshioka
Jinyu Li
S. Sivasankaran
Sefik Emre Eskimez
17
14
0
09 Nov 2022
Distribution-based Emotion Recognition in Conversation
Distribution-based Emotion Recognition in Conversation
Wen Wu
C. Zhang
P. Woodland
19
4
0
09 Nov 2022
SpeechMatrix: A Large-Scale Mined Corpus of Multilingual
  Speech-to-Speech Translations
SpeechMatrix: A Large-Scale Mined Corpus of Multilingual Speech-to-Speech Translations
Paul-Ambroise Duquenne
Hongyu Gong
Ning Dong
Jingfei Du
Ann Lee
Vedanuj Goswani
Changhan Wang
J. Pino
Benoît Sagot
Holger Schwenk
37
34
0
08 Nov 2022
ERNIE-SAT: Speech and Text Joint Pretraining for Cross-Lingual
  Multi-Speaker Text-to-Speech
ERNIE-SAT: Speech and Text Joint Pretraining for Cross-Lingual Multi-Speaker Text-to-Speech
Xiaoran Fan
Chao Pang
Tian Yuan
Richard He Bai
Renjie Zheng
...
Junkun Chen
Zeyu Chen
Liang Huang
Yu Sun
Hua-Hong Wu
35
0
0
07 Nov 2022
Once-for-All Sequence Compression for Self-Supervised Speech Models
Once-for-All Sequence Compression for Self-Supervised Speech Models
Hsuan-Jui Chen
Yen Meng
Hung-yi Lee
14
4
0
04 Nov 2022
When to Laugh and How Hard? A Multimodal Approach to Detecting Humor and
  its Intensity
When to Laugh and How Hard? A Multimodal Approach to Detecting Humor and its Intensity
Khalid Alnajjar
Mika Hämäläinen
Jörg Tiedemann
Jorma T. Laaksonen
M. Kurimo
18
2
0
03 Nov 2022
Channel-Aware Pretraining of Joint Encoder-Decoder Self-Supervised Model
  for Telephonic-Speech ASR
Channel-Aware Pretraining of Joint Encoder-Decoder Self-Supervised Model for Telephonic-Speech ASR
Vrunda N. Sukhadia
Anjana Arunkumar
S. Umesh
18
1
0
03 Nov 2022
Phoneme Segmentation Using Self-Supervised Speech Models
Phoneme Segmentation Using Self-Supervised Speech Models
Luke Strgar
David F. Harwath
SSL
22
10
0
02 Nov 2022
data2vec-aqc: Search for the right Teaching Assistant in the
  Teacher-Student training setup
data2vec-aqc: Search for the right Teaching Assistant in the Teacher-Student training setup
Vasista Sai Lodagala
Sreyan Ghosh
S. Umesh
SSL
33
5
0
02 Nov 2022
Audio Language Modeling using Perceptually-Guided Discrete
  Representations
Audio Language Modeling using Perceptually-Guided Discrete Representations
Felix Kreuk
Yaniv Taigman
Adam Polyak
Jade Copet
Gabriel Synnaeve
Alexandre Défossez
Yossi Adi
27
4
0
02 Nov 2022
Intermediate Fine-Tuning Using Imperfect Synthetic Speech for Improving
  Electrolaryngeal Speech Recognition
Intermediate Fine-Tuning Using Imperfect Synthetic Speech for Improving Electrolaryngeal Speech Recognition
Lester Phillip Violeta
D. Ma
Wen-Chin Huang
T. Toda
19
7
0
02 Nov 2022
Avoid Overthinking in Self-Supervised Models for Speech Recognition
Avoid Overthinking in Self-Supervised Models for Speech Recognition
Dan Berrebbi
Brian Yan
Shinji Watanabe
LRM
13
4
0
01 Nov 2022
Textless Direct Speech-to-Speech Translation with Discrete Speech
  Representation
Textless Direct Speech-to-Speech Translation with Discrete Speech Representation
Xinjian Li
Ye Jia
Chung-Cheng Chiu
23
23
0
31 Oct 2022
Scoring Black-Box Models for Adversarial Robustness
Scoring Black-Box Models for Adversarial Robustness
Jian Vora
Pranay Reddy Samala
23
0
0
31 Oct 2022
Mining Word Boundaries in Speech as Naturally Annotated Word
  Segmentation Data
Mining Word Boundaries in Speech as Naturally Annotated Word Segmentation Data
Lei Zhang
Zhenghua Li
Shilin Zhou
Chen Gong
Zhefeng Wang
Baoxing Huai
Min Zhang
16
0
0
31 Oct 2022
Predicting Multi-Codebook Vector Quantization Indexes for Knowledge
  Distillation
Predicting Multi-Codebook Vector Quantization Indexes for Knowledge Distillation
Liyong Guo
Xiaoyu Yang
Quandong Wang
Yuxiang Kong
Zengwei Yao
...
Wei Kang
Long Lin
Mingshuang Luo
Piotr Żelasko
Daniel Povey
VLM
21
7
0
31 Oct 2022
Joint Pre-Training with Speech and Bilingual Text for Direct Speech to
  Speech Translation
Joint Pre-Training with Speech and Bilingual Text for Direct Speech to Speech Translation
Kun Wei
Long Zhou
Zi-Hua Zhang
Liping Chen
Shujie Liu
Lei He
Jinyu Li
Furu Wei
14
13
0
31 Oct 2022
Analyzing Acoustic Word Embeddings from Pre-trained Self-supervised
  Speech Models
Analyzing Acoustic Word Embeddings from Pre-trained Self-supervised Speech Models
Ramon Sanabria
Hao Tang
Sharon Goldwater
SSL
38
18
0
28 Oct 2022
Self-supervised language learning from raw audio: Lessons from the Zero
  Resource Speech Challenge
Self-supervised language learning from raw audio: Lessons from the Zero Resource Speech Challenge
Ewan Dunbar
Nicolas Hamilakis
Emmanuel Dupoux
SSL
24
30
0
27 Oct 2022
Exploring Effective Distillation of Self-Supervised Speech Models for
  Automatic Speech Recognition
Exploring Effective Distillation of Self-Supervised Speech Models for Automatic Speech Recognition
Yujin Wang
Changli Tang
Ziyang Ma
Zhisheng Zheng
Xie Chen
Weiqiang Zhang
31
1
0
27 Oct 2022
Robust Data2vec: Noise-robust Speech Representation Learning for ASR by
  Combining Regression and Improved Contrastive Learning
Robust Data2vec: Noise-robust Speech Representation Learning for ASR by Combining Regression and Improved Contrastive Learning
Qiu-shi Zhu
Long Zhou
Jie M. Zhang
Shujie Liu
Yu-Chen Hu
Lirong Dai
VLM
SSL
50
37
0
27 Oct 2022
Training Autoregressive Speech Recognition Models with Limited in-domain
  Supervision
Training Autoregressive Speech Recognition Models with Limited in-domain Supervision
Chak-Fai Li
Francis Keith
William Hartmann
M. Snover
14
0
0
27 Oct 2022
Real-time Speech Interruption Analysis: From Cloud to Client Deployment
Real-time Speech Interruption Analysis: From Cloud to Client Deployment
Quchen Fu
Szu-Wei Fu
Yaran Fan
Yu-Huan Wu
Zhuo Chen
J. Gupchup
Ross Cutler
26
0
0
24 Oct 2022
Self-supervised Rewiring of Pre-trained Speech Encoders: Towards Faster
  Fine-tuning with Less Labels in Speech Processing
Self-supervised Rewiring of Pre-trained Speech Encoders: Towards Faster Fine-tuning with Less Labels in Speech Processing
Haomiao Yang
Jinming Zhao
Gholamreza Haffari
Ehsan Shareghi
25
2
0
24 Oct 2022
Bootstrapping meaning through listening: Unsupervised learning of spoken
  sentence embeddings
Bootstrapping meaning through listening: Unsupervised learning of spoken sentence embeddings
Jian Zhu
Zuoyu Tian
Yadong Liu
Cong Zhang
Chia-wen Lo
SSL
30
2
0
23 Oct 2022
DOROTHIE: Spoken Dialogue for Handling Unexpected Situations in
  Interactive Autonomous Driving Agents
DOROTHIE: Spoken Dialogue for Handling Unexpected Situations in Interactive Autonomous Driving Agents
Ziqiao Ma
B. VanDerPloeg
Cristian-Paul Bara
Yidong Huang
Eui-In Kim
Felix Gervits
M. Marge
J. Chai
52
7
0
22 Oct 2022
A Textless Metric for Speech-to-Speech Comparison
A Textless Metric for Speech-to-Speech Comparison
Laurent Besacier
S. Ribeiro
Olivier Galibert
Ioan Calapodescu
33
5
0
21 Oct 2022
SS-VAERR: Self-Supervised Apparent Emotional Reaction Recognition from
  Video
SS-VAERR: Self-Supervised Apparent Emotional Reaction Recognition from Video
Marija Jegorova
Stavros Petridis
M. Pantic
22
2
0
20 Oct 2022
End-to-End Integration of Speech Recognition, Dereverberation,
  Beamforming, and Self-Supervised Learning Representation
End-to-End Integration of Speech Recognition, Dereverberation, Beamforming, and Self-Supervised Learning Representation
Yoshiki Masuyama
Xuankai Chang
Samuele Cornell
Shinji Watanabe
Nobutaka Ono
15
19
0
19 Oct 2022
Self-supervised Heterogeneous Graph Pre-training Based on Structural
  Clustering
Self-supervised Heterogeneous Graph Pre-training Based on Structural Clustering
Yaming Yang
Ziyu Guan
Zhe Wang
Wei Zhao
Cai Xu
Weigang Lu
Jianbin Huang
SSL
13
39
0
19 Oct 2022
SUPERB @ SLT 2022: Challenge on Generalization and Efficiency of
  Self-Supervised Speech Representation Learning
SUPERB @ SLT 2022: Challenge on Generalization and Efficiency of Self-Supervised Speech Representation Learning
Tzu-hsun Feng
Annie Dong
Ching-Feng Yeh
Shu-Wen Yang
Tzu-Quan Lin
...
Xuankai Chang
Shinji Watanabe
Abdel-rahman Mohamed
Shang-Wen Li
Hung-yi Lee
ELM
SSL
24
33
0
16 Oct 2022
CTCBERT: Advancing Hidden-unit BERT with CTC Objectives
CTCBERT: Advancing Hidden-unit BERT with CTC Objectives
Ruchao Fan
Yiming Wang
Yashesh Gaur
Jinyu Li
33
7
0
16 Oct 2022
Learning Invariant Representation and Risk Minimized for Unsupervised
  Accent Domain Adaptation
Learning Invariant Representation and Risk Minimized for Unsupervised Accent Domain Adaptation
Chendong Zhao
Jianzong Wang
Xiaoyang Qu
Haoqian Wang
Jing Xiao
SSL
30
1
0
15 Oct 2022
Improving generalizability of distilled self-supervised speech
  processing models under distorted settings
Improving generalizability of distilled self-supervised speech processing models under distorted settings
Kuan-Po Huang
Yu-Kuan Fu
Tsung-Yuan Hsu
Fabian Ritter Gutierrez
Fan Wang
Liang-Hsuan Tseng
Yu Zhang
Hung-yi Lee
24
13
0
14 Oct 2022
JOIST: A Joint Speech and Text Streaming Model For ASR
JOIST: A Joint Speech and Text Streaming Model For ASR
Tara N. Sainath
Rohit Prabhavalkar
Ankur Bapna
Yu Zhang
Zhouyuan Huo
Zhehuai Chen
Bo-wen Li
Weiran Wang
Trevor Strohman
RALM
AuLLM
38
35
0
13 Oct 2022
On the Utility of Self-supervised Models for Prosody-related Tasks
On the Utility of Self-supervised Models for Prosody-related Tasks
Guan-Ting Lin
Chiyu Feng
Wei-Ping Huang
Yuan Tseng
Tzu-Han Lin
Chen An Li
Hung-yi Lee
Nigel G. Ward
21
47
0
13 Oct 2022
A context-aware knowledge transferring strategy for CTC-based ASR
A context-aware knowledge transferring strategy for CTC-based ASR
Keda Lu
Kuan-Yu Chen
15
14
0
12 Oct 2022
GAN You Hear Me? Reclaiming Unconditional Speech Synthesis from
  Diffusion Models
GAN You Hear Me? Reclaiming Unconditional Speech Synthesis from Diffusion Models
Matthew Baas
Herman Kamper
DiffM
24
8
0
11 Oct 2022
CTC Alignments Improve Autoregressive Translation
CTC Alignments Improve Autoregressive Translation
Brian Yan
Siddharth Dalmia
Yosuke Higuchi
Graham Neubig
Florian Metze
A. Black
Shinji Watanabe
44
33
0
11 Oct 2022
CoBERT: Self-Supervised Speech Representation Learning Through Code
  Representation Learning
CoBERT: Self-Supervised Speech Representation Learning Through Code Representation Learning
Chutong Meng
Junyi Ao
Tom Ko
Mingxuan Wang
Haizhou Li
SSL
39
6
0
08 Oct 2022
Contrastive Audio-Visual Masked Autoencoder
Contrastive Audio-Visual Masked Autoencoder
Yuan Gong
Andrew Rouditchenko
Alexander H. Liu
David F. Harwath
Leonid Karlinsky
Hilde Kuehne
James R. Glass
24
119
0
02 Oct 2022
AudioGen: Textually Guided Audio Generation
AudioGen: Textually Guided Audio Generation
Felix Kreuk
Gabriel Synnaeve
Adam Polyak
Uriel Singer
Alexandre Défossez
Jade Copet
Devi Parikh
Yaniv Taigman
Yossi Adi
DiffM
17
289
0
30 Sep 2022
Previous
123...106789
Next