Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
All Papers
0 / 0 papers shown
Title
Home
Papers
1904.03240
Cited By
v1
v2 (latest)
An Unsupervised Autoregressive Model for Speech Representation Learning
5 April 2019
Yu-An Chung
Wei-Ning Hsu
Hao Tang
James R. Glass
SSL
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"An Unsupervised Autoregressive Model for Speech Representation Learning"
50 / 269 papers shown
Title
Adaptive vector steering: A training-free, layer-wise intervention for hallucination mitigation in large audio and multimodal models
Tsung-En Lin
Kuan-Yi Lee
Hung-yi Lee
LLMSV
187
0
0
14 Oct 2025
Learning Robust Spatial Representations from Binaural Audio through Feature Distillation
H. S. Bovbjerg
Jan Østergaard
Jesper Jensen
Shinji Watanabe
Zheng-Hua Tan
SSL
108
2
0
28 Aug 2025
EmoSLLM: Parameter-Efficient Adaptation of LLMs for Speech Emotion Recognition
Hugo Thimonier
Antony Perzo
Renaud Seguier
108
1
0
19 Aug 2025
Representing Speech Through Autoregressive Prediction of Cochlear Tokens
Greta Tuckute
Klemen Kotar
Evelina Fedorenko
Daniel L. K. Yamins
109
0
0
15 Aug 2025
How do Multimodal Foundation Models Encode Text and Speech? An Analysis of Cross-Lingual and Cross-Modal Representations
North American Chapter of the Association for Computational Linguistics (NAACL), 2024
Hyunji Lee
Danni Liu
Supriti Sinhamahapatra
Jan Niehues
405
4
0
21 Feb 2025
Towards Maximum Likelihood Training for Transducer-based Streaming Speech Recognition
IEEE Signal Processing Letters (SPL), 2024
Hyeonseung Lee
J. Yoon
Sungsoo Kim
N. Kim
267
0
0
26 Nov 2024
DC-Spin: A Speaker-invariant Speech Tokenizer for Spoken Language Models
Heng-Jui Chang
Hongyu Gong
Changhan Wang
James R. Glass
Yu-An Chung
293
5
0
31 Oct 2024
BiSSL: Enhancing the Alignment Between Self-Supervised Pretraining and Downstream Fine-Tuning via Bilevel Optimization
Gustav Wagner Zakarias
Lars Kai Hansen
Zheng-Hua Tan
331
0
0
03 Oct 2024
Stimulus Modality Matters: Impact of Perceptual Evaluations from Different Modalities on Speech Emotion Recognition System Performance
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2024
Huang-Cheng Chou
Haibin Wu
Hung-yi Lee
Chi-Chun Lee
338
3
0
16 Sep 2024
NEST-RQ: Next Token Prediction for Speech Self-Supervised Pre-Training
Minglun Han
Ye Bai
Chen Shen
Youjia Huang
Mingkun Huang
Zehua Lin
Linhao Dong
Lu Lu
Yuxuan Wang
205
2
0
13 Sep 2024
Efficient Training of Self-Supervised Speech Foundation Models on a Compute Budget
Spoken Language Technology Workshop (SLT), 2024
Andy T. Liu
Yi-Cheng Lin
Haibin Wu
Stefan Winkler
Hung-yi Lee
268
4
0
09 Sep 2024
Progressive Residual Extraction based Pre-training for Speech Representation Learning
IEEE Transactions on Audio, Speech, and Language Processing (TASLP), 2024
Tianrui Wang
Jin Li
Ziyang Ma
Rui Cao
Xie Chen
...
Meng Ge
Xiaobao Wang
Yuguang Wang
Jianwu Dang
Nyima Tashi
SSL
266
3
0
31 Aug 2024
Contrastive Augmentation: An Unsupervised Learning Approach for Keyword Spotting in Speech Technology
Weinan Dai
Yifeng Jiang
Yuanjing Liu
Jinkun Chen
Xin Sun
Jinglei Tao
SSL
148
1
0
31 Aug 2024
Speech Representation Learning Revisited: The Necessity of Separate Learnable Parameters and Robust Data Augmentation
Hemant Yadav
Sunayana Sitaram
R. Shah
SSL
288
0
0
20 Aug 2024
VQ-CTAP: Cross-Modal Fine-Grained Sequence Representation Learning for Speech Processing
IEEE Transactions on Audio, Speech, and Language Processing (IEEE TASLP), 2024
Chunyu Qiang
Wang Geng
Yi Zhao
Ruibo Fu
Tao Wang
...
Chen Zhang
Hao Che
L. Wang
Jianwu Dang
Jianhua Tao
AI4TS
281
7
0
11 Aug 2024
Performance Analysis of Speech Encoders for Low-Resource SLU and ASR in Tunisian Dialect
Salima Mdhaffar
Haroun Elleuch
Fethi Bougares
Yannick Esteve
292
4
0
05 Jul 2024
Towards the Next Frontier in Speech Representation Learning Using Disentanglement
Varun Krishna
Sriram Ganapathy
SSL
209
2
0
02 Jul 2024
MS-HuBERT: Mitigating Pre-training and Inference Mismatch in Masked Language Modelling methods for learning Speech Representations
Interspeech (Interspeech), 2024
Hemant Yadav
Sunayana Sitaram
R. Shah
SSL
281
2
0
09 Jun 2024
DAISY: Data Adaptive Self-Supervised Early Exit for Speech Representation Models
Interspeech (Interspeech), 2024
Tzu-Quan Lin
Hung-yi Lee
Hao Tang
233
4
0
08 Jun 2024
Using Self-supervised Learning Can Improve Model Fairness
Sofia Yfantidou
Dimitris Spathis
Marios Constantinides
Athena Vakali
Daniele Quercia
F. Kawsar
293
8
0
04 Jun 2024
Alternators For Sequence Modeling
Mohammad Reza Rezaei
Adji Bousso Dieng
199
2
0
20 May 2024
SSAMBA: Self-Supervised Audio Representation Learning with Mamba State Space Model
Siavash Shams
Sukru Samet Dindar
Xilin Jiang
N. Mesgarani
Mamba
260
38
0
20 May 2024
A Large-Scale Evaluation of Speech Foundation Models
Shu-Wen Yang
Heng-Jui Chang
Zili Huang
Andy T. Liu
Cheng-I Jeff Lai
...
Kushal Lakhotia
Shang-Wen Li
Abdelrahman Mohamed
Shinji Watanabe
Hung-yi Lee
232
55
0
15 Apr 2024
Mai Hoómāuna i ka Ái: Language Models Improve Automatic Speech Recognition in Hawaiian
Kaavya Chaparala
Guido Zarrella
Bruce Torres Fischer
Larry Kimura
Oiwi Parker Jones
AuLLM
145
0
0
03 Apr 2024
EMO-SUPERB: An In-depth Look at Speech Emotion Recognition
Haibin Wu
Huang-Cheng Chou
Kai-Wei Chang
Lucas Goncalves
Jiawei Du
Jyh-Shing Roger Jang
Chi-Chun Lee
Hung-Yi Lee
339
19
0
20 Feb 2024
SpeechCLIP+: Self-supervised multi-task representation learning for speech via CLIP and speech-image data
Hsuan-Fu Wang
Yi-Jen Shih
Heng-Jui Chang
Layne Berry
Puyuan Peng
Hung-yi Lee
Hsin-Min Wang
David Harwath
VLM
170
6
0
10 Feb 2024
On the Transferability of Large-Scale Self-Supervision to Few-Shot Audio Classification
Calum Heggan
S. Budgett
Timothy M. Hospedales
Mehrdad Yaghoobi
SSL
254
3
0
02 Feb 2024
What Do Self-Supervised Speech and Speaker Models Learn? New Findings From a Cross Model Layer-Wise Analysis
Takanori Ashihara
Marc Delcroix
Takafumi Moriya
Kohei Matsuura
Taichi Asami
Yusuke Ijima
SSL
243
16
0
31 Jan 2024
StreamVoice: Streamable Context-Aware Language Modeling for Real-time Zero-Shot Voice Conversion
Annual Meeting of the Association for Computational Linguistics (ACL), 2024
Zhichao Wang
Yuan-Jui Chen
Xinsheng Wang
Lei Xie
Yuping Wang
285
12
0
19 Jan 2024
Evaluating Fairness in Self-supervised and Supervised Models for Sequential Data
Sofia Yfantidou
Dimitris Spathis
Marios Constantinides
Athena Vakali
Daniele Quercia
F. Kawsar
291
3
0
03 Jan 2024
Self-supervised Pretraining for Robust Personalized Voice Activity Detection in Adverse Conditions
H. S. Bovbjerg
Jesper Jensen
Jan Østergaard
Zheng-Hua Tan
VLM
237
8
0
27 Dec 2023
Acoustic models of Brazilian Portuguese Speech based on Neural Transformers
M. Gauy
Marcelo Finger
108
2
0
14 Dec 2023
Self-Supervised Learning of Spatial Acoustic Representation with Cross-Channel Signal Reconstruction and Multi-Channel Conformer
IEEE/ACM Transactions on Audio Speech and Language Processing (TASLP), 2023
Bing Yang
Xiaofei Li
SSL
291
4
0
01 Dec 2023
A Quantitative Approach to Understand Self-Supervised Models as Cross-lingual Feature Extractors
International Conference on Natural Language and Speech Processing (ICNLSP), 2023
Shuyue Stella Li
Beining Xu
Xiangyu Zhang
Hexin Liu
Wen-Han Chao
Leibny Paola García
SSL
124
4
0
27 Nov 2023
Multi-objective Non-intrusive Hearing-aid Speech Assessment Model
Hsin-Tien Chiang
Szu-Wei Fu
Hsin-Min Wang
Yu Tsao
John H. L. Hansen
176
8
0
15 Nov 2023
Towards Matching Phones and Speech Representations
Automatic Speech Recognition & Understanding (ASRU), 2023
Gene-Ping Yang
Hao Tang
SSL
191
1
0
26 Oct 2023
Self-Supervised Representation Learning for Online Handwriting Text Classification
Pouya Mehralian
Bagher Babaali
Ashena Gorgan Mohammadi
SSL
162
2
0
10 Oct 2023
DualVC 2: Dynamic Masked Convolution for Unified Streaming and Non-Streaming Voice Conversion
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023
Ziqian Ning
Yuepeng Jiang
Pengcheng Zhu
Shuai Wang
Jixun Yao
Linfu Xie
Mengxiao Bi
256
8
0
27 Sep 2023
Reduce, Reuse, Recycle: Is Perturbed Data better than Other Language augmentation for Low Resource Self-Supervised Speech Models
Interspeech (Interspeech), 2023
Asad Ullah
Alessandro Ragano
Andrew Hines
388
4
0
22 Sep 2023
LeBenchmark 2.0: a Standardized, Replicable and Enhanced Framework for Self-supervised Representations of French Speech
Computer Speech and Language (CSL), 2023
Titouan Parcollet
H. Nguyen
Solène Evain
Marcely Zanon Boito
Adrien Pupier
...
François Portet
Solange Rossato
Fabien Ringeval
D. Schwab
Laurent Besacier
240
25
0
11 Sep 2023
Understanding Self-Supervised Learning of Speech Representation via Invariance and Redundancy Reduction
Yusuf Brima
U. Krumnack
Simone Pika
Gunther Heidemann
SSL
240
0
0
07 Sep 2023
Acoustic-to-articulatory inversion for dysarthric speech: Are pre-trained self-supervised representations favorable?
Sarthak Kumar Maharana
Krishna Kamal Adidam
Shoumik Nandi
Ajitesh Srivastava
368
5
0
03 Sep 2023
Self-Supervised Learning for Audio-Based Emotion Recognition
Peranut Nimitsurachat
Peter Washington
186
3
0
23 Jul 2023
Representation Learning With Hidden Unit Clustering For Low Resource Speech Applications
IEEE/ACM Transactions on Audio Speech and Language Processing (TASLP), 2023
Varun Krishna
T. Sai
Sriram Ganapathy
SSL
148
3
0
14 Jul 2023
On-Device Constrained Self-Supervised Speech Representation Learning for Keyword Spotting via Knowledge Distillation
Interspeech (Interspeech), 2023
Gene-Ping Yang
Yue Gu
Qingming Tang
Dongsu Du
Yuzong Liu
153
6
0
06 Jul 2023
Evaluation of Speech Representations for MOS prediction
International Conference on Text, Speech and Dialogue (TSD), 2023
F. S. Oliveira
Edresson Casanova
Arnaldo Cândido Júnior
L. Gris
A. S. Soares
A. R. G. Filho
117
4
0
16 Jun 2023
Pushing the Limits of Unsupervised Unit Discovery for SSL Speech Representation
Interspeech (Interspeech), 2023
Ziyang Ma
Zhisheng Zheng
Guanrou Yang
Yu Wang
Chuxu Zhang
Xie Chen
SSL
132
11
0
15 Jun 2023
Feature Normalization for Fine-tuning Self-Supervised Models in Speech Enhancement
Interspeech (Interspeech), 2023
Hejung Yang
Hong-Goo Kang
SSL
152
1
0
14 Jun 2023
How Generative Spoken Language Modeling Encodes Noisy Speech: Investigation from Phonetics to Syntactics
Interspeech (Interspeech), 2023
Joonyong Park
Shinnosuke Takamichi
Tomohiko Nakamura
Kentaro Seki
Detai Xin
Hiroshi Saruwatari
AuLLM
93
3
0
01 Jun 2023
Masked Autoencoders with Multi-Window Local-Global Attention Are Better Audio Learners
International Conference on Learning Representations (ICLR), 2023
Sarthak Yadav
Sergios Theodoridis
Lars Kai Hansen
Zheng-Hua Tan
210
14
0
01 Jun 2023
1
2
3
4
5
6
Next