ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2106.07447
  4. Cited By
HuBERT: Self-Supervised Speech Representation Learning by Masked
  Prediction of Hidden Units

HuBERT: Self-Supervised Speech Representation Learning by Masked Prediction of Hidden Units

14 June 2021
Wei-Ning Hsu
Benjamin Bolte
Yao-Hung Hubert Tsai
Kushal Lakhotia
Ruslan Salakhutdinov
Abdel-rahman Mohamed
    SSL
ArXivPDFHTML

Papers citing "HuBERT: Self-Supervised Speech Representation Learning by Masked Prediction of Hidden Units"

50 / 451 papers shown
Title
TVLT: Textless Vision-Language Transformer
TVLT: Textless Vision-Language Transformer
Zineng Tang
Jaemin Cho
Yixin Nie
Mohit Bansal
VLM
49
28
0
28 Sep 2022
Speech Enhancement Using Self-Supervised Pre-Trained Model and Vector
  Quantization
Speech Enhancement Using Self-Supervised Pre-Trained Model and Vector Quantization
Xiaokang Zhao
Qiu-shi Zhu
Jie M. Zhang
32
4
0
28 Sep 2022
Watch What You Pretrain For: Targeted, Transferable Adversarial Examples
  on Self-Supervised Speech Recognition models
Watch What You Pretrain For: Targeted, Transferable Adversarial Examples on Self-Supervised Speech Recognition models
R. Olivier
H. Abdullah
Bhiksha Raj
AAML
21
1
0
17 Sep 2022
AudioLM: a Language Modeling Approach to Audio Generation
AudioLM: a Language Modeling Approach to Audio Generation
Zalan Borsos
Raphaël Marinier
Damien Vincent
Eugene Kharitonov
Olivier Pietquin
...
Dominik Roblek
O. Teboul
David Grangier
Marco Tagliasacchi
Neil Zeghidour
AuLLM
22
566
0
07 Sep 2022
DualVoice: Speech Interaction that Discriminates between Normal and
  Whispered Voice Input
DualVoice: Speech Interaction that Discriminates between Normal and Whispered Voice Input
Jun Rekimoto
14
6
0
22 Aug 2022
Fully Automated End-to-End Fake Audio Detection
Fully Automated End-to-End Fake Audio Detection
Chenglong Wang
Jiangyan Yi
J. Tao
Haiyang Sun
Xun Chen
Zhengkun Tian
Haoxin Ma
Cunhang Fan
Ruibo Fu
24
28
0
20 Aug 2022
3M: An Effective Multi-view, Multi-granularity, and Multi-aspect
  Modeling Approach to English Pronunciation Assessment
3M: An Effective Multi-view, Multi-granularity, and Multi-aspect Modeling Approach to English Pronunciation Assessment
Fu-An Chao
Tien-Hong Lo
Tzu-I Wu
Yao-Ting Sung
Berlin Chen
26
41
0
19 Aug 2022
Extending RNN-T-based speech recognition systems with emotion and
  language classification
Extending RNN-T-based speech recognition systems with emotion and language classification
Zvi Kons
Hagai Aronowitz
E. Morais
Matheus Damasceno
H. Kuo
Samuel Thomas
G. Saon
14
5
0
28 Jul 2022
Transfer Learning of wav2vec 2.0 for Automatic Lyric Transcription
Transfer Learning of wav2vec 2.0 for Automatic Lyric Transcription
Longshen Ou
Xiangming Gu
Ye Wang
25
21
0
20 Jul 2022
ILASR: Privacy-Preserving Incremental Learning for Automatic Speech
  Recognition at Production Scale
ILASR: Privacy-Preserving Incremental Learning for Automatic Speech Recognition at Production Scale
Gopinath Chennupati
Milind Rao
Gurpreet Chadha
Aaron Eakin
A. Raju
...
Andrew Oberlin
Buddha Nandanoor
Prahalad Venkataramanan
Zheng Wu
Pankaj Sitpure
CLL
16
8
0
19 Jul 2022
u-HuBERT: Unified Mixed-Modal Speech Pretraining And Zero-Shot Transfer
  to Unlabeled Modality
u-HuBERT: Unified Mixed-Modal Speech Pretraining And Zero-Shot Transfer to Unlabeled Modality
Wei-Ning Hsu
Bowen Shi
SSL
VLM
14
41
0
14 Jul 2022
Semi-supervised cross-lingual speech emotion recognition
Semi-supervised cross-lingual speech emotion recognition
Mirko Agarla
Simone Bianco
Luigi Celona
Paolo Napoletano
A. Petrovsky
Flavio Piccoli
Raimondo Schettini
I. Shanin
25
14
0
14 Jul 2022
Masked Autoencoders that Listen
Masked Autoencoders that Listen
Po-Yao (Bernie) Huang
Hu Xu
Juncheng Billy Li
Alexei Baevski
Michael Auli
Wojciech Galuba
Florian Metze
Christoph Feichtenhofer
13
268
0
13 Jul 2022
Machine Learning Model Sizes and the Parameter Gap
Machine Learning Model Sizes and the Parameter Gap
Pablo Villalobos
J. Sevilla
T. Besiroglu
Lennart Heim
A. Ho
Marius Hobbhahn
ALM
ELM
AI4CE
18
56
0
05 Jul 2022
FitHuBERT: Going Thinner and Deeper for Knowledge Distillation of Speech
  Self-Supervised Learning
FitHuBERT: Going Thinner and Deeper for Knowledge Distillation of Speech Self-Supervised Learning
Yeonghyeon Lee
Kangwook Jang
Jahyun Goo
Youngmoon Jung
Hoi-Rim Kim
15
28
0
01 Jul 2022
Toward Low-Cost End-to-End Spoken Language Understanding
Toward Low-Cost End-to-End Spoken Language Understanding
Marco Dinarelli
M. Naguib
Franccois Portet
11
5
0
01 Jul 2022
FeaRLESS: Feature Refinement Loss for Ensembling Self-Supervised
  Learning Features in Robust End-to-end Speech Recognition
FeaRLESS: Feature Refinement Loss for Ensembling Self-Supervised Learning Features in Robust End-to-end Speech Recognition
Szu-Jui Chen
Jiamin Xie
John H. L. Hansen
29
8
0
30 Jun 2022
STOP: A dataset for Spoken Task Oriented Semantic Parsing
STOP: A dataset for Spoken Task Oriented Semantic Parsing
Paden Tomasello
Akshat Shrivastava
Daniel Lazar
Po-Chun Hsu
Duc Le
...
Robin Algayres
Tu Nguyen
Emmanuel Dupoux
Luke Zettlemoyer
Abdel-rahman Mohamed
17
35
0
29 Jun 2022
Distilling a Pretrained Language Model to a Multilingual ASR Model
Distilling a Pretrained Language Model to a Multilingual ASR Model
Kwanghee Choi
Hyung-Min Park
VLM
11
10
0
25 Jun 2022
Multitask vocal burst modeling with ResNets and pre-trained
  paralinguistic Conformers
Multitask vocal burst modeling with ResNets and pre-trained paralinguistic Conformers
Joshua Belanich
Krishna Somandepalli
B. Eoff
Brendan Jou
22
2
0
24 Jun 2022
Predicting within and across language phoneme recognition performance of
  self-supervised learning speech pre-trained models
Predicting within and across language phoneme recognition performance of self-supervised learning speech pre-trained models
Han Ji
T. Patel
O. Scharenborg
29
7
0
24 Jun 2022
Comparing supervised and self-supervised embedding for ExVo Multi-Task
  learning track
Comparing supervised and self-supervised embedding for ExVo Multi-Task learning track
Tilak Purohit
Imen Ben Mahmoud
Bogdan Vlasenko
Mathew Magimai.-Doss
SSL
15
8
0
23 Jun 2022
Masked Siamese ConvNets
Masked Siamese ConvNets
L. Jing
Jiachen Zhu
Yann LeCun
SSL
35
34
0
15 Jun 2022
Transformer-based Automatic Speech Recognition of Formal and Colloquial
  Czech in MALACH Project
Transformer-based Automatic Speech Recognition of Formal and Colloquial Czech in MALACH Project
Jan Lehecka
J. Psutka
Josef Psutka
8
4
0
15 Jun 2022
Investigation of Ensemble features of Self-Supervised Pretrained Models
  for Automatic Speech Recognition
Investigation of Ensemble features of Self-Supervised Pretrained Models for Automatic Speech Recognition
Anjana Arunkumar
Vrunda N. Sukhadia
S. Umesh
22
10
0
11 Jun 2022
AS2T: Arbitrary Source-To-Target Adversarial Attack on Speaker
  Recognition Systems
AS2T: Arbitrary Source-To-Target Adversarial Attack on Speaker Recognition Systems
Guangke Chen
Zhe Zhao
Fu Song
Sen Chen
Lingling Fan
Yang Liu
AAML
25
18
0
07 Jun 2022
Squeezeformer: An Efficient Transformer for Automatic Speech Recognition
Squeezeformer: An Efficient Transformer for Automatic Speech Recognition
Sehoon Kim
A. Gholami
Albert Eaton Shaw
Nicholas Lee
K. Mangalam
Jitendra Malik
Michael W. Mahoney
Kurt Keutzer
19
99
0
02 Jun 2022
Do self-supervised speech models develop human-like perception biases?
Do self-supervised speech models develop human-like perception biases?
Juliette Millet
Ewan Dunbar
SSL
11
20
0
31 May 2022
Is Lip Region-of-Interest Sufficient for Lipreading?
Is Lip Region-of-Interest Sufficient for Lipreading?
Jing-Xuan Zhang
Genshun Wan
Jia-Yu Pan
16
6
0
28 May 2022
Self-supervised models of audio effectively explain human cortical
  responses to speech
Self-supervised models of audio effectively explain human cortical responses to speech
Aditya R. Vaidya
Shailee Jain
Alexander G. Huth
20
42
0
27 May 2022
Self-Supervised Speech Representation Learning: A Review
Self-Supervised Speech Representation Learning: A Review
Abdel-rahman Mohamed
Hung-yi Lee
Lasse Borgholt
Jakob Drachmann Havtorn
Joakim Edin
...
Shang-Wen Li
Karen Livescu
Lars Maaløe
Tara N. Sainath
Shinji Watanabe
SSL
AI4TS
124
348
0
21 May 2022
Voice Activity Projection: Self-supervised Learning of Turn-taking
  Events
Voice Activity Projection: Self-supervised Learning of Turn-taking Events
Erik Ekstedt
Gabriel Skantze
14
33
0
19 May 2022
SAMU-XLSR: Semantically-Aligned Multimodal Utterance-level Cross-Lingual
  Speech Representation
SAMU-XLSR: Semantically-Aligned Multimodal Utterance-level Cross-Lingual Speech Representation
Sameer Khurana
Antoine Laurent
James R. Glass
25
36
0
17 May 2022
Silence is Sweeter Than Speech: Self-Supervised Model Using Silence to
  Store Speaker Information
Silence is Sweeter Than Speech: Self-Supervised Model Using Silence to Store Speaker Information
Chiyu Feng
Po-Chun Hsu
Hung-yi Lee
SSL
20
8
0
08 May 2022
i-Code: An Integrative and Composable Multimodal Learning Framework
i-Code: An Integrative and Composable Multimodal Learning Framework
Ziyi Yang
Yuwei Fang
Chenguang Zhu
Reid Pryzant
Dongdong Chen
...
Bin Xiao
Yuanxun Lu
Takuya Yoshioka
Michael Zeng
Xuedong Huang
40
45
0
03 May 2022
Unsupervised Word Segmentation using K Nearest Neighbors
Unsupervised Word Segmentation using K Nearest Neighbors
T. Fuchs
Yedid Hoshen
Joseph Keshet
SSL
12
6
0
27 Apr 2022
Masked Spectrogram Modeling using Masked Autoencoders for Learning
  General-purpose Audio Representation
Masked Spectrogram Modeling using Masked Autoencoders for Learning General-purpose Audio Representation
Daisuke Niizumi
Daiki Takeuchi
Yasunori Ohishi
N. Harada
K. Kashino
24
65
0
26 Apr 2022
Mask scalar prediction for improving robust automatic speech recognition
Mask scalar prediction for improving robust automatic speech recognition
A. Narayanan
James Walker
S. Panchapagesan
N. Howard
Yuma Koizumi
11
4
0
26 Apr 2022
On-demand compute reduction with stochastic wav2vec 2.0
On-demand compute reduction with stochastic wav2vec 2.0
Apoorv Vyas
Wei-Ning Hsu
Michael Auli
Alexei Baevski
18
13
0
25 Apr 2022
WaBERT: A Low-resource End-to-end Model for Spoken Language
  Understanding and Speech-to-BERT Alignment
WaBERT: A Low-resource End-to-end Model for Spoken Language Understanding and Speech-to-BERT Alignment
Lin Yao
Jianfei Song
Rui Xu
Yingfang Yang
Zijian Chen
Yafeng Deng
VLM
13
2
0
22 Apr 2022
ContentVec: An Improved Self-Supervised Speech Representation by
  Disentangling Speakers
ContentVec: An Improved Self-Supervised Speech Representation by Disentangling Speakers
Kaizhi Qian
Yang Zhang
Heting Gao
Junrui Ni
Cheng-I Jeff Lai
David D. Cox
M. Hasegawa-Johnson
Shiyu Chang
DRL
14
110
0
20 Apr 2022
BYOL for Audio: Exploring Pre-trained General-purpose Audio
  Representations
BYOL for Audio: Exploring Pre-trained General-purpose Audio Representations
Daisuke Niizumi
Daiki Takeuchi
Yasunori Ohishi
N. Harada
K. Kashino
SSL
34
53
0
15 Apr 2022
Tokenwise Contrastive Pretraining for Finer Speech-to-BERT Alignment in
  End-to-End Speech-to-Intent Systems
Tokenwise Contrastive Pretraining for Finer Speech-to-BERT Alignment in End-to-End Speech-to-Intent Systems
Vishal Sunder
Eric Fosler-Lussier
Samuel Thomas
H. Kuo
Brian Kingsbury
21
7
0
11 Apr 2022
The PartialSpoof Database and Countermeasures for the Detection of Short
  Fake Speech Segments Embedded in an Utterance
The PartialSpoof Database and Countermeasures for the Detection of Short Fake Speech Segments Embedded in an Utterance
Lin Zhang
Xin Wang
Erica Cooper
Nicholas W. D. Evans
Junichi Yamagishi
19
56
0
11 Apr 2022
Fusion of Self-supervised Learned Models for MOS Prediction
Fusion of Self-supervised Learned Models for MOS Prediction
Zhengdong Yang
Wangjin Zhou
Chenhui Chu
Sheng Li
Raj Dabre
Raphaël Rubino
Yi Zhao
20
28
0
11 Apr 2022
The Sillwood Technologies System for the VoiceMOS Challenge 2022
The Sillwood Technologies System for the VoiceMOS Challenge 2022
Jiameng Gao
18
0
0
08 Apr 2022
GigaST: A 10,000-hour Pseudo Speech Translation Corpus
GigaST: A 10,000-hour Pseudo Speech Translation Corpus
Rong Ye
Chengqi Zhao
Tom Ko
Chutong Meng
Tao Wang
Mingxuan Wang
Jun Cao
9
23
0
08 Apr 2022
Automatic Pronunciation Assessment using Self-Supervised Speech
  Representation Learning
Automatic Pronunciation Assessment using Self-Supervised Speech Representation Learning
Eesung Kim
J. Jeon
Hyeji Seo
Ho-Young Kim
SSL
21
37
0
08 Apr 2022
MTI-Net: A Multi-Target Speech Intelligibility Prediction Model
MTI-Net: A Multi-Target Speech Intelligibility Prediction Model
Ryandhimas E. Zezario
Szu-Wei Fu
Fei Chen
C. Fuh
Hsin-Min Wang
Yu Tsao
19
13
0
07 Apr 2022
MBI-Net: A Non-Intrusive Multi-Branched Speech Intelligibility
  Prediction Model for Hearing Aids
MBI-Net: A Non-Intrusive Multi-Branched Speech Intelligibility Prediction Model for Hearing Aids
Ryandhimas E. Zezario
Fei Chen
C. Fuh
Hsin-Min Wang
Yu Tsao
24
16
0
07 Apr 2022
Previous
123...10789
Next