ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1904.03240
  4. Cited By
An Unsupervised Autoregressive Model for Speech Representation Learning
v1v2 (latest)

An Unsupervised Autoregressive Model for Speech Representation Learning

5 April 2019
Yu-An Chung
Wei-Ning Hsu
Hao Tang
James R. Glass
    SSL
ArXiv (abs)PDFHTML

Papers citing "An Unsupervised Autoregressive Model for Speech Representation Learning"

50 / 269 papers shown
Title
Weakly-Supervised Speech Pre-training: A Case Study on Target Speech
  Recognition
Weakly-Supervised Speech Pre-training: A Case Study on Target Speech RecognitionInterspeech (Interspeech), 2023
Wangyou Zhang
Y. Qian
187
12
0
25 May 2023
Can Self-Supervised Neural Representations Pre-Trained on Human Speech
  distinguish Animal Callers?
Can Self-Supervised Neural Representations Pre-Trained on Human Speech distinguish Animal Callers?Interspeech (Interspeech), 2023
Eklavya Sarkar
Mathew Magimai.-Doss
184
17
0
23 May 2023
Self-supervised Predictive Coding Models Encode Speaker and Phonetic
  Information in Orthogonal Subspaces
Self-supervised Predictive Coding Models Encode Speaker and Phonetic Information in Orthogonal SubspacesInterspeech (Interspeech), 2023
Oli Danyi Liu
Hao Tang
Sharon Goldwater
SSL
164
14
0
21 May 2023
DualVC: Dual-mode Voice Conversion using Intra-model Knowledge
  Distillation and Hybrid Predictive Coding
DualVC: Dual-mode Voice Conversion using Intra-model Knowledge Distillation and Hybrid Predictive CodingInterspeech (Interspeech), 2023
Ziqian Ning
Yuepeng Jiang
Pengcheng Zhu
Jixun Yao
Shuai Wang
Linfu Xie
Mengxiao Bi
172
13
0
21 May 2023
TrustSER: On the Trustworthiness of Fine-tuning Pre-trained Speech
  Embeddings For Speech Emotion Recognition
TrustSER: On the Trustworthiness of Fine-tuning Pre-trained Speech Embeddings For Speech Emotion RecognitionIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023
Tiantian Feng
Rajat Hebbar
Shrikanth Narayanan
147
9
0
18 May 2023
Speech Separation based on Contrastive Learning and Deep Modularization
Speech Separation based on Contrastive Learning and Deep Modularization
Peter Ochieng
SSL
176
0
0
18 May 2023
DinoSR: Self-Distillation and Online Clustering for Self-supervised
  Speech Representation Learning
DinoSR: Self-Distillation and Online Clustering for Self-supervised Speech Representation LearningNeural Information Processing Systems (NeurIPS), 2023
Alexander H. Liu
Heng-Jui Chang
Michael Auli
Wei-Ning Hsu
James R. Glass
410
33
0
17 May 2023
Cocktail HuBERT: Generalized Self-Supervised Pre-training for Mixture
  and Single-Source Speech
Cocktail HuBERT: Generalized Self-Supervised Pre-training for Mixture and Single-Source SpeechIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023
Maryam Fazel-Zarandi
Wei-Ning Hsu
SSL
121
12
0
20 Mar 2023
IRGen: Generative Modeling for Image Retrieval
IRGen: Generative Modeling for Image RetrievalEuropean Conference on Computer Vision (ECCV), 2023
Yidan Zhang
Ting Zhang
Dong Chen
Yujing Wang
Qi Chen
...
Tao Gui
Fan Yang
Mao Yang
Q. Liao
B. Guo
3DVVLM
283
21
0
17 Mar 2023
Analysing the Masked predictive coding training criterion for
  pre-training a Speech Representation Model
Analysing the Masked predictive coding training criterion for pre-training a Speech Representation ModelIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023
Hemant Yadav
Sunayana Sitaram
R. Shah
SSL
187
4
0
13 Mar 2023
Accommodating Audio Modality in CLIP for Multimodal Processing
Accommodating Audio Modality in CLIP for Multimodal ProcessingAAAI Conference on Artificial Intelligence (AAAI), 2023
Ludan Ruan
Anwen Hu
Yuqing Song
Liang Zhang
S. Zheng
Qin Jin
VLM
164
16
0
12 Mar 2023
Improving Self-Supervised Learning for Audio Representations by Feature
  Diversity and Decorrelation
Improving Self-Supervised Learning for Audio Representations by Feature Diversity and DecorrelationIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023
Bac Nguyen
Stefan Uhlich
Fabien Cardinaux
SSL
198
4
0
07 Mar 2023
Google USM: Scaling Automatic Speech Recognition Beyond 100 Languages
Google USM: Scaling Automatic Speech Recognition Beyond 100 Languages
Yu Zhang
Wei Han
James Qin
Yongqiang Wang
Ankur Bapna
...
Pedro J. Moreno
Chung-Cheng Chiu
J. Schalkwyk
Franccoise Beaufays
Yonghui Wu
VLM
318
342
0
02 Mar 2023
deHuBERT: Disentangling Noise in a Self-supervised Model for Robust
  Speech Recognition
deHuBERT: Disentangling Noise in a Self-supervised Model for Robust Speech RecognitionIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023
Dianwen Ng
Ruixi Zhang
J. Yip
Zhao Yang
Jinjie Ni
Chong Zhang
Yukun Ma
Chongjia Ni
Eng Siong Chng
B. Ma
220
18
0
28 Feb 2023
A low latency attention module for streaming self-supervised speech
  representation learning
A low latency attention module for streaming self-supervised speech representation learning
Jianbo Ma
Siqi Pan
Deepak Chandran
A. Fanelli
Richard Cartwright
152
0
0
27 Feb 2023
BayesSpeech: A Bayesian Transformer Network for Automatic Speech
  Recognition
BayesSpeech: A Bayesian Transformer Network for Automatic Speech Recognition
Will Rieger
BDLUQCV
112
0
0
16 Jan 2023
Context-aware Fine-tuning of Self-supervised Speech Models
Context-aware Fine-tuning of Self-supervised Speech ModelsIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2022
Suwon Shon
Felix Wu
Kwangyoun Kim
Prashant Sridhar
Karen Livescu
Shinji Watanabe
130
10
0
16 Dec 2022
Deep neural network techniques for monaural speech enhancement: state of
  the art analysis
Deep neural network techniques for monaural speech enhancement: state of the art analysisArtificial Intelligence Review (Artif Intell Rev), 2022
P. Ochieng
217
33
0
01 Dec 2022
CHAPTER: Exploiting Convolutional Neural Network Adapters for
  Self-supervised Speech Models
CHAPTER: Exploiting Convolutional Neural Network Adapters for Self-supervised Speech Models
Zih-Ching Chen
Yu-Shun Sung
Hung-yi Lee
173
20
0
01 Dec 2022
VATLM: Visual-Audio-Text Pre-Training with Unified Masked Prediction for
  Speech Representation Learning
VATLM: Visual-Audio-Text Pre-Training with Unified Masked Prediction for Speech Representation LearningIEEE transactions on multimedia (IEEE TMM), 2022
Qiu-shi Zhu
Long Zhou
Zi-Hua Zhang
Shujie Liu
Binxing Jiao
Jie Zhang
Lirong Dai
Daxin Jiang
Jinyu Li
Furu Wei
211
48
0
21 Nov 2022
MelHuBERT: A simplified HuBERT on Mel spectrograms
MelHuBERT: A simplified HuBERT on Mel spectrogramsAutomatic Speech Recognition & Understanding (ASRU), 2022
Tzu-Quan Lin
Hung-yi Lee
Hao Tang
SSL
196
18
0
17 Nov 2022
Is Smaller Always Faster? Tradeoffs in Compressing Self-Supervised Speech Transformers
Is Smaller Always Faster? Tradeoffs in Compressing Self-Supervised Speech Transformers
Tzu-Quan Lin
Tsung-Huan Yang
Chun-Yao Chang
Kuang-Ming Chen
Tzu-hsun Feng
Hung-yi Lee
Hao Tang
234
6
0
17 Nov 2022
Introducing Semantics into Speech Encoders
Introducing Semantics into Speech EncodersAnnual Meeting of the Association for Computational Linguistics (ACL), 2022
Derek Xu
Shuyan Dong
Changhan Wang
Suyoun Kim
Mohammad Kachuee
...
Alexei Baevski
Guan-Ting Lin
Hung-yi Lee
Luke Huan
Wei Wang
SSL
148
3
0
15 Nov 2022
Improving Children's Speech Recognition by Fine-tuning Self-supervised
  Adult Speech Representations
Improving Children's Speech Recognition by Fine-tuning Self-supervised Adult Speech Representations
Renée Lu
M. Shahin
Beena Ahmed
157
7
0
14 Nov 2022
MT4SSL: Boosting Self-Supervised Speech Representation Learning by
  Integrating Multiple Targets
MT4SSL: Boosting Self-Supervised Speech Representation Learning by Integrating Multiple TargetsInterspeech (Interspeech), 2022
Ziyang Ma
Zhisheng Zheng
Changli Tang
Yujin Wang
Xie Chen
294
21
0
14 Nov 2022
Investigating Enhancements to Contrastive Predictive Coding for Human
  Activity Recognition
Investigating Enhancements to Contrastive Predictive Coding for Human Activity RecognitionAnnual IEEE International Conference on Pervasive Computing and Communications (PerCom), 2022
H. Haresamudram
Irfan Essa
Thomas Ploetz
AI4TS
270
20
0
11 Nov 2022
Self-supervised learning with bi-label masked speech prediction for
  streaming multi-talker speech recognition
Self-supervised learning with bi-label masked speech prediction for streaming multi-talker speech recognitionIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2022
Zili Huang
Zhuo Chen
Naoyuki Kanda
Jian Wu
Yiming Wang
Jinyu Li
Takuya Yoshioka
Xiaofei Wang
Peidong Wang
165
4
0
10 Nov 2022
Biased Self-supervised learning for ASR
Biased Self-supervised learning for ASRInterspeech (Interspeech), 2022
Florian Kreyssig
Yangyang Shi
Jinxi Guo
Leda Sari
Abdel-rahman Mohamed
P. Woodland
SSL
143
4
0
04 Nov 2022
Losses Can Be Blessings: Routing Self-Supervised Speech Representations Towards Efficient Multilingual and Multitask Speech Processing
Losses Can Be Blessings: Routing Self-Supervised Speech Representations Towards Efficient Multilingual and Multitask Speech ProcessingNeural Information Processing Systems (NeurIPS), 2022
Yonggan Fu
Yang Zhang
Kaizhi Qian
Zhifan Ye
Zhongzhi Yu
Cheng-I Jeff Lai
Yingyan Lin
316
10
0
02 Nov 2022
Speech-text based multi-modal training with bidirectional attention for
  improved speech recognition
Speech-text based multi-modal training with bidirectional attention for improved speech recognitionIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2022
Yuhang Yang
Haihua Xu
Hao-Ming Huang
Eng Siong Chng
Sheng Li
160
7
0
01 Nov 2022
Improved acoustic-to-articulatory inversion using representations from
  pretrained self-supervised learning models
Improved acoustic-to-articulatory inversion using representations from pretrained self-supervised learning modelsIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2022
Sathvik Udupa
Siddarth C
P. Ghosh
155
10
0
30 Oct 2022
Learning Dependencies of Discrete Speech Representations with Neural
  Hidden Markov Models
Learning Dependencies of Discrete Speech Representations with Neural Hidden Markov ModelsIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2022
Sung-Lin Yeh
Hao Tang
SSLBDL
112
1
0
29 Oct 2022
Application of Knowledge Distillation to Multi-task Speech
  Representation Learning
Application of Knowledge Distillation to Multi-task Speech Representation LearningInterspeech (Interspeech), 2022
Mine Kerpicci
V. Nguyen
Shuhua Zhang
Erik M. Visser
156
0
0
29 Oct 2022
FedAudio: A Federated Learning Benchmark for Audio Tasks
FedAudio: A Federated Learning Benchmark for Audio TasksIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2022
Tuo Zhang
Tiantian Feng
Samiul Alam
Sunwoo Lee
Mi Zhang
Shrikanth S. Narayanan
Salman Avestimehr
FedML
227
30
0
27 Oct 2022
Improving Speech Representation Learning via Speech-level and
  Phoneme-level Masking Approach
Improving Speech Representation Learning via Speech-level and Phoneme-level Masking ApproachInternational Conference on Mobile Ad-hoc and Sensor Networks (MSN), 2022
Xulong Zhang
Jianzong Wang
Ning Cheng
Kexin Zhu
Jing Xiao
118
1
0
25 Oct 2022
Guided contrastive self-supervised pre-training for automatic speech
  recognition
Guided contrastive self-supervised pre-training for automatic speech recognitionSpoken Language Technology Workshop (SLT), 2022
Aparna Khare
Minhua Wu
Saurabhchand Bhati
J. Droppo
Roland Maas
SSL
148
0
0
22 Oct 2022
SUPERB @ SLT 2022: Challenge on Generalization and Efficiency of
  Self-Supervised Speech Representation Learning
SUPERB @ SLT 2022: Challenge on Generalization and Efficiency of Self-Supervised Speech Representation LearningSpoken Language Technology Workshop (SLT), 2022
Tzu-hsun Feng
Annie Dong
Ching-Feng Yeh
Shu-Wen Yang
Tzu-Quan Lin
...
Xuankai Chang
Shinji Watanabe
Abdel-rahman Mohamed
Shang-Wen Li
Hung-yi Lee
ELMSSL
196
38
0
16 Oct 2022
CTCBERT: Advancing Hidden-unit BERT with CTC Objectives
CTCBERT: Advancing Hidden-unit BERT with CTC ObjectivesIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2022
Ruchao Fan
Yiming Wang
Yashesh Gaur
Jinyu Li
244
8
0
16 Oct 2022
On the Utility of Self-supervised Models for Prosody-related Tasks
On the Utility of Self-supervised Models for Prosody-related TasksSpoken Language Technology Workshop (SLT), 2022
Guan-Ting Lin
Chiyu Feng
Wei-Ping Huang
Yuan Tseng
Tzu-Han Lin
Chen-An Li
Hung-yi Lee
Nigel G. Ward
125
59
0
13 Oct 2022
Exploration of A Self-Supervised Speech Model: A Study on Emotional
  Corpora
Exploration of A Self-Supervised Speech Model: A Study on Emotional CorporaSpoken Language Technology Workshop (SLT), 2022
Yuanchao Li
Yumnah Mohamied
P. Bell
Catherine Lai
SSL
290
51
0
05 Oct 2022
SpeechCLIP: Integrating Speech with Pre-Trained Vision and Language
  Model
SpeechCLIP: Integrating Speech with Pre-Trained Vision and Language ModelSpoken Language Technology Workshop (SLT), 2022
Yi-Jen Shih
Hsuan-Fu Wang
Heng-Jui Chang
Layne Berry
Hung-yi Lee
David Harwath
VLMCLIP
335
39
0
03 Oct 2022
Augmentation Invariant Discrete Representation for Generative Spoken
  Language Modeling
Augmentation Invariant Discrete Representation for Generative Spoken Language ModelingInternational Workshop on Spoken Language Translation (IWSLT), 2022
Itai Gat
Felix Kreuk
Tu Nguyen
Ann Lee
Jade Copet
Gabriel Synnaeve
Emmanuel Dupoux
Yossi Adi
179
14
0
30 Sep 2022
SpeechLM: Enhanced Speech Pre-Training with Unpaired Textual Data
SpeechLM: Enhanced Speech Pre-Training with Unpaired Textual DataIEEE/ACM Transactions on Audio Speech and Language Processing (TASLP), 2022
Zi-Hua Zhang
Sanyuan Chen
Long Zhou
Yu Wu
Shuo Ren
...
Zhuoyuan Yao
Xun Gong
Lirong Dai
Jinyu Li
Furu Wei
241
64
0
30 Sep 2022
The Efficacy of Self-Supervised Speech Models for Audio Representations
The Efficacy of Self-Supervised Speech Models for Audio Representations
Tung-Yu Wu
Chen-An Li
Tzu-Han Lin
Tsung-Yuan Hsu
Hung-yi Lee
178
6
0
26 Sep 2022
Non-Contrastive Self-supervised Learning for Utterance-Level Information
  Extraction from Speech
Non-Contrastive Self-supervised Learning for Utterance-Level Information Extraction from SpeechIEEE Journal on Selected Topics in Signal Processing (IEEE JSTSP), 2022
Jaejin Cho
Jesús Villalba
Laureano Moro-Velazquez
Najim Dehak
SSL
164
22
0
10 Aug 2022
Transfer Learning of wav2vec 2.0 for Automatic Lyric Transcription
Transfer Learning of wav2vec 2.0 for Automatic Lyric TranscriptionInternational Society for Music Information Retrieval Conference (ISMIR), 2022
Longshen Ou
Xiangming Gu
Ye Wang
157
24
0
20 Jul 2022
Deep versus Wide: An Analysis of Student Architectures for Task-Agnostic
  Knowledge Distillation of Self-Supervised Speech Models
Deep versus Wide: An Analysis of Student Architectures for Task-Agnostic Knowledge Distillation of Self-Supervised Speech ModelsInterspeech (Interspeech), 2022
Takanori Ashihara
Takafumi Moriya
Kohei Matsuura
Tomohiro Tanaka
143
34
0
14 Jul 2022
FeaRLESS: Feature Refinement Loss for Ensembling Self-Supervised
  Learning Features in Robust End-to-end Speech Recognition
FeaRLESS: Feature Refinement Loss for Ensembling Self-Supervised Learning Features in Robust End-to-end Speech RecognitionInterspeech (Interspeech), 2022
Szu-Jui Chen
Jiamin Xie
John H. L. Hansen
170
9
0
30 Jun 2022
Wav2Vec-Aug: Improved self-supervised training with limited data
Wav2Vec-Aug: Improved self-supervised training with limited dataInterspeech (Interspeech), 2022
Anuroop Sriram
Michael Auli
Alexei Baevski
SSLVLM
153
16
0
27 Jun 2022
Predicting within and across language phoneme recognition performance of
  self-supervised learning speech pre-trained models
Predicting within and across language phoneme recognition performance of self-supervised learning speech pre-trained models
Han Ji
T. Patel
O. Scharenborg
200
9
0
24 Jun 2022
Previous
123456
Next