Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2106.07447
Cited By
HuBERT: Self-Supervised Speech Representation Learning by Masked Prediction of Hidden Units
14 June 2021
Wei-Ning Hsu
Benjamin Bolte
Yao-Hung Hubert Tsai
Kushal Lakhotia
Ruslan Salakhutdinov
Abdel-rahman Mohamed
SSL
Re-assign community
ArXiv
PDF
HTML
Papers citing
"HuBERT: Self-Supervised Speech Representation Learning by Masked Prediction of Hidden Units"
50 / 451 papers shown
Title
Model Extraction Attack against Self-supervised Speech Models
Tsung-Yuan Hsu
Chen An Li
Tung-Yu Wu
Hung-yi Lee
19
1
0
29 Nov 2022
TESSP: Text-Enhanced Self-Supervised Speech Pre-training
Zhuoyuan Yao
Shuo Ren
Sanyuan Chen
Ziyang Ma
Pengcheng Guo
Linfu Xie
22
5
0
24 Nov 2022
Device Directedness with Contextual Cues for Spoken Dialog Systems
Dhanush Bekal
S. Srinivasan
S. Bodapati
S. Ronanki
Katrin Kirchhoff
31
1
0
23 Nov 2022
Exploring WavLM on Speech Enhancement
Hyungchan Song
Sanyuan Chen
Zhuo Chen
Yu-Huan Wu
Takuya Yoshioka
M. Tang
Jong Won Shin
Shujie Liu
8
16
0
18 Nov 2022
Compressing Transformer-based self-supervised models for speech processing
Tzu-Quan Lin
Tsung-Huan Yang
Chun-Yao Chang
Kuang-Ming Chen
Tzu-hsun Feng
Hung-yi Lee
Hao Tang
32
6
0
17 Nov 2022
MelHuBERT: A simplified HuBERT on Mel spectrograms
Tzu-Quan Lin
Hung-yi Lee
Hao Tang
SSL
24
13
0
17 Nov 2022
NANSY++: Unified Voice Synthesis with Neural Analysis and Synthesis
Hyeong-Seok Choi
Jinhyeok Yang
Juheon Lee
Hyeongju Kim
16
46
0
17 Nov 2022
A unified one-shot prosody and speaker conversion system with self-supervised discrete speech units
Li-Wei Chen
Shinji Watanabe
Alexander I. Rudnicky
18
6
0
12 Nov 2022
Speech separation with large-scale self-supervised learning
Zhuo Chen
Naoyuki Kanda
Jian Wu
Yu-Huan Wu
Xiaofei Wang
Takuya Yoshioka
Jinyu Li
S. Sivasankaran
Sefik Emre Eskimez
17
14
0
09 Nov 2022
Distribution-based Emotion Recognition in Conversation
Wen Wu
C. Zhang
P. Woodland
19
4
0
09 Nov 2022
SpeechMatrix: A Large-Scale Mined Corpus of Multilingual Speech-to-Speech Translations
Paul-Ambroise Duquenne
Hongyu Gong
Ning Dong
Jingfei Du
Ann Lee
Vedanuj Goswani
Changhan Wang
J. Pino
Benoît Sagot
Holger Schwenk
37
34
0
08 Nov 2022
ERNIE-SAT: Speech and Text Joint Pretraining for Cross-Lingual Multi-Speaker Text-to-Speech
Xiaoran Fan
Chao Pang
Tian Yuan
Richard He Bai
Renjie Zheng
...
Junkun Chen
Zeyu Chen
Liang Huang
Yu Sun
Hua-Hong Wu
35
0
0
07 Nov 2022
Once-for-All Sequence Compression for Self-Supervised Speech Models
Hsuan-Jui Chen
Yen Meng
Hung-yi Lee
14
4
0
04 Nov 2022
When to Laugh and How Hard? A Multimodal Approach to Detecting Humor and its Intensity
Khalid Alnajjar
Mika Hämäläinen
Jörg Tiedemann
Jorma T. Laaksonen
M. Kurimo
18
2
0
03 Nov 2022
Channel-Aware Pretraining of Joint Encoder-Decoder Self-Supervised Model for Telephonic-Speech ASR
Vrunda N. Sukhadia
Anjana Arunkumar
S. Umesh
18
1
0
03 Nov 2022
Phoneme Segmentation Using Self-Supervised Speech Models
Luke Strgar
David F. Harwath
SSL
22
10
0
02 Nov 2022
data2vec-aqc: Search for the right Teaching Assistant in the Teacher-Student training setup
Vasista Sai Lodagala
Sreyan Ghosh
S. Umesh
SSL
33
5
0
02 Nov 2022
Audio Language Modeling using Perceptually-Guided Discrete Representations
Felix Kreuk
Yaniv Taigman
Adam Polyak
Jade Copet
Gabriel Synnaeve
Alexandre Défossez
Yossi Adi
27
4
0
02 Nov 2022
Intermediate Fine-Tuning Using Imperfect Synthetic Speech for Improving Electrolaryngeal Speech Recognition
Lester Phillip Violeta
D. Ma
Wen-Chin Huang
T. Toda
19
7
0
02 Nov 2022
Avoid Overthinking in Self-Supervised Models for Speech Recognition
Dan Berrebbi
Brian Yan
Shinji Watanabe
LRM
13
4
0
01 Nov 2022
Textless Direct Speech-to-Speech Translation with Discrete Speech Representation
Xinjian Li
Ye Jia
Chung-Cheng Chiu
23
23
0
31 Oct 2022
Scoring Black-Box Models for Adversarial Robustness
Jian Vora
Pranay Reddy Samala
23
0
0
31 Oct 2022
Mining Word Boundaries in Speech as Naturally Annotated Word Segmentation Data
Lei Zhang
Zhenghua Li
Shilin Zhou
Chen Gong
Zhefeng Wang
Baoxing Huai
Min Zhang
16
0
0
31 Oct 2022
Predicting Multi-Codebook Vector Quantization Indexes for Knowledge Distillation
Liyong Guo
Xiaoyu Yang
Quandong Wang
Yuxiang Kong
Zengwei Yao
...
Wei Kang
Long Lin
Mingshuang Luo
Piotr Żelasko
Daniel Povey
VLM
21
7
0
31 Oct 2022
Joint Pre-Training with Speech and Bilingual Text for Direct Speech to Speech Translation
Kun Wei
Long Zhou
Zi-Hua Zhang
Liping Chen
Shujie Liu
Lei He
Jinyu Li
Furu Wei
14
13
0
31 Oct 2022
Analyzing Acoustic Word Embeddings from Pre-trained Self-supervised Speech Models
Ramon Sanabria
Hao Tang
Sharon Goldwater
SSL
38
18
0
28 Oct 2022
Self-supervised language learning from raw audio: Lessons from the Zero Resource Speech Challenge
Ewan Dunbar
Nicolas Hamilakis
Emmanuel Dupoux
SSL
24
30
0
27 Oct 2022
Exploring Effective Distillation of Self-Supervised Speech Models for Automatic Speech Recognition
Yujin Wang
Changli Tang
Ziyang Ma
Zhisheng Zheng
Xie Chen
Weiqiang Zhang
31
1
0
27 Oct 2022
Robust Data2vec: Noise-robust Speech Representation Learning for ASR by Combining Regression and Improved Contrastive Learning
Qiu-shi Zhu
Long Zhou
Jie M. Zhang
Shujie Liu
Yu-Chen Hu
Lirong Dai
VLM
SSL
50
37
0
27 Oct 2022
Training Autoregressive Speech Recognition Models with Limited in-domain Supervision
Chak-Fai Li
Francis Keith
William Hartmann
M. Snover
14
0
0
27 Oct 2022
Real-time Speech Interruption Analysis: From Cloud to Client Deployment
Quchen Fu
Szu-Wei Fu
Yaran Fan
Yu-Huan Wu
Zhuo Chen
J. Gupchup
Ross Cutler
26
0
0
24 Oct 2022
Self-supervised Rewiring of Pre-trained Speech Encoders: Towards Faster Fine-tuning with Less Labels in Speech Processing
Haomiao Yang
Jinming Zhao
Gholamreza Haffari
Ehsan Shareghi
25
2
0
24 Oct 2022
Bootstrapping meaning through listening: Unsupervised learning of spoken sentence embeddings
Jian Zhu
Zuoyu Tian
Yadong Liu
Cong Zhang
Chia-wen Lo
SSL
30
2
0
23 Oct 2022
DOROTHIE: Spoken Dialogue for Handling Unexpected Situations in Interactive Autonomous Driving Agents
Ziqiao Ma
B. VanDerPloeg
Cristian-Paul Bara
Yidong Huang
Eui-In Kim
Felix Gervits
M. Marge
J. Chai
52
7
0
22 Oct 2022
A Textless Metric for Speech-to-Speech Comparison
Laurent Besacier
S. Ribeiro
Olivier Galibert
Ioan Calapodescu
33
5
0
21 Oct 2022
SS-VAERR: Self-Supervised Apparent Emotional Reaction Recognition from Video
Marija Jegorova
Stavros Petridis
M. Pantic
22
2
0
20 Oct 2022
End-to-End Integration of Speech Recognition, Dereverberation, Beamforming, and Self-Supervised Learning Representation
Yoshiki Masuyama
Xuankai Chang
Samuele Cornell
Shinji Watanabe
Nobutaka Ono
15
19
0
19 Oct 2022
Self-supervised Heterogeneous Graph Pre-training Based on Structural Clustering
Yaming Yang
Ziyu Guan
Zhe Wang
Wei Zhao
Cai Xu
Weigang Lu
Jianbin Huang
SSL
13
39
0
19 Oct 2022
SUPERB @ SLT 2022: Challenge on Generalization and Efficiency of Self-Supervised Speech Representation Learning
Tzu-hsun Feng
Annie Dong
Ching-Feng Yeh
Shu-Wen Yang
Tzu-Quan Lin
...
Xuankai Chang
Shinji Watanabe
Abdel-rahman Mohamed
Shang-Wen Li
Hung-yi Lee
ELM
SSL
24
33
0
16 Oct 2022
CTCBERT: Advancing Hidden-unit BERT with CTC Objectives
Ruchao Fan
Yiming Wang
Yashesh Gaur
Jinyu Li
33
7
0
16 Oct 2022
Learning Invariant Representation and Risk Minimized for Unsupervised Accent Domain Adaptation
Chendong Zhao
Jianzong Wang
Xiaoyang Qu
Haoqian Wang
Jing Xiao
SSL
30
1
0
15 Oct 2022
Improving generalizability of distilled self-supervised speech processing models under distorted settings
Kuan-Po Huang
Yu-Kuan Fu
Tsung-Yuan Hsu
Fabian Ritter Gutierrez
Fan Wang
Liang-Hsuan Tseng
Yu Zhang
Hung-yi Lee
24
13
0
14 Oct 2022
JOIST: A Joint Speech and Text Streaming Model For ASR
Tara N. Sainath
Rohit Prabhavalkar
Ankur Bapna
Yu Zhang
Zhouyuan Huo
Zhehuai Chen
Bo-wen Li
Weiran Wang
Trevor Strohman
RALM
AuLLM
38
35
0
13 Oct 2022
On the Utility of Self-supervised Models for Prosody-related Tasks
Guan-Ting Lin
Chiyu Feng
Wei-Ping Huang
Yuan Tseng
Tzu-Han Lin
Chen An Li
Hung-yi Lee
Nigel G. Ward
21
47
0
13 Oct 2022
A context-aware knowledge transferring strategy for CTC-based ASR
Keda Lu
Kuan-Yu Chen
15
14
0
12 Oct 2022
GAN You Hear Me? Reclaiming Unconditional Speech Synthesis from Diffusion Models
Matthew Baas
Herman Kamper
DiffM
24
8
0
11 Oct 2022
CTC Alignments Improve Autoregressive Translation
Brian Yan
Siddharth Dalmia
Yosuke Higuchi
Graham Neubig
Florian Metze
A. Black
Shinji Watanabe
44
33
0
11 Oct 2022
CoBERT: Self-Supervised Speech Representation Learning Through Code Representation Learning
Chutong Meng
Junyi Ao
Tom Ko
Mingxuan Wang
Haizhou Li
SSL
39
6
0
08 Oct 2022
Contrastive Audio-Visual Masked Autoencoder
Yuan Gong
Andrew Rouditchenko
Alexander H. Liu
David F. Harwath
Leonid Karlinsky
Hilde Kuehne
James R. Glass
24
119
0
02 Oct 2022
AudioGen: Textually Guided Audio Generation
Felix Kreuk
Gabriel Synnaeve
Adam Polyak
Uriel Singer
Alexandre Défossez
Jade Copet
Devi Parikh
Yaniv Taigman
Yossi Adi
DiffM
17
289
0
30 Sep 2022
Previous
1
2
3
...
10
6
7
8
9
Next