v1v2 (latest)

An Unsupervised Autoregressive Model for Speech Representation Learning

5 April 2019

Hao Tang

Papers citing "An Unsupervised Autoregressive Model for Speech Representation Learning"

50 / 269 papers shown

Adaptive vector steering: A training-free, layer-wise intervention for hallucination mitigation in large audio and multimodal models

195

14 Oct 2025

Learning Robust Spatial Representations from Binaural Audio through Feature Distillation

116

28 Aug 2025

EmoSLLM: Parameter-Efficient Adaptation of LLMs for Speech Emotion Recognition

Hugo Thimonier

Antony Perzo

Renaud Seguier

140

19 Aug 2025

Representing Speech Through Autoregressive Prediction of Cochlear Tokens

121

15 Aug 2025

How do Multimodal Foundation Models Encode Text and Speech? An Analysis of Cross-Lingual and Cross-Modal RepresentationsNorth American Chapter of the Association for Computational Linguistics (NAACL), 2024

Hyunji Lee

Danni Liu

Supriti Sinhamahapatra

Jan Niehues

421

21 Feb 2025

Towards Maximum Likelihood Training for Transducer-based Streaming Speech RecognitionIEEE Signal Processing Letters (SPL), 2024

287

26 Nov 2024

DC-Spin: A Speaker-invariant Speech Tokenizer for Spoken Language Models

305

31 Oct 2024

BiSSL: Enhancing the Alignment Between Self-Supervised Pretraining and Downstream Fine-Tuning via Bilevel Optimization

Gustav Wagner Zakarias

Lars Kai Hansen

Zheng-Hua Tan

363

03 Oct 2024

Stimulus Modality Matters: Impact of Perceptual Evaluations from Different Modalities on Speech Emotion Recognition System PerformanceIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2024

398

16 Sep 2024

NEST-RQ: Next Token Prediction for Speech Self-Supervised Pre-Training

Minglun Han

Ye Bai

Chen Shen

Youjia Huang

Mingkun Huang

Zehua Lin

Linhao Dong

Lu Lu

Yuxuan Wang

222

13 Sep 2024

Efficient Training of Self-Supervised Speech Foundation Models on a Compute BudgetSpoken Language Technology Workshop (SLT), 2024

Andy T. Liu

Yi-Cheng Lin

Haibin Wu

Stefan Winkler

Hung-yi Lee

320

09 Sep 2024

Progressive Residual Extraction based Pre-training for Speech Representation LearningIEEE Transactions on Audio, Speech, and Language Processing (TASLP), 2024

Tianrui Wang

Jin Li

Ziyang Ma

Rui Cao

Xie Chen

...

Meng Ge

Xiaobao Wang

Yuguang Wang

Jianwu Dang

Nyima Tashi

SSL

290

31 Aug 2024

Contrastive Augmentation: An Unsupervised Learning Approach for Keyword Spotting in Speech Technology

170

31 Aug 2024

Speech Representation Learning Revisited: The Necessity of Separate Learnable Parameters and Robust Data Augmentation

304

20 Aug 2024

VQ-CTAP: Cross-Modal Fine-Grained Sequence Representation Learning for Speech ProcessingIEEE Transactions on Audio, Speech, and Language Processing (IEEE TASLP), 2024

...

317

11 Aug 2024

Performance Analysis of Speech Encoders for Low-Resource SLU and ASR in Tunisian Dialect

304

05 Jul 2024

Towards the Next Frontier in Speech Representation Learning Using Disentanglement

Varun Krishna

Sriram Ganapathy

SSL

258

02 Jul 2024

MS-HuBERT: Mitigating Pre-training and Inference Mismatch in Masked Language Modelling methods for learning Speech RepresentationsInterspeech (Interspeech), 2024

285

09 Jun 2024

DAISY: Data Adaptive Self-Supervised Early Exit for Speech Representation ModelsInterspeech (Interspeech), 2024

Tzu-Quan Lin

Hung-yi Lee

Hao Tang

273

08 Jun 2024

Using Self-supervised Learning Can Improve Model Fairness

Sofia Yfantidou

Dimitris Spathis

Marios Constantinides

Athena Vakali

Daniele Quercia

F. Kawsar

313

04 Jun 2024

Alternators For Sequence Modeling

Mohammad Reza Rezaei

Adji Bousso Dieng

219

20 May 2024

SSAMBA: Self-Supervised Audio Representation Learning with Mamba State Space Model

Xilin Jiang

316

20 May 2024

A Large-Scale Evaluation of Speech Foundation Models

...

Shinji Watanabe

Hung-yi Lee

272

15 Apr 2024

Mai Hoómāuna i ka Ái: Language Models Improve Automatic Speech Recognition in Hawaiian

153

03 Apr 2024

EMO-SUPERB: An In-depth Look at Speech Emotion Recognition

Haibin Wu

Jiawei Du

Chi-Chun Lee

Hung-Yi Lee

391

20 Feb 2024

SpeechCLIP+: Self-supervised multi-task representation learning for speech via CLIP and speech-image data

Hung-yi Lee

182

10 Feb 2024

On the Transferability of Large-Scale Self-Supervision to Few-Shot Audio Classification

Calum Heggan

S. Budgett

Timothy M. Hospedales

Mehrdad Yaghoobi

SSL

302

02 Feb 2024

What Do Self-Supervised Speech and Speaker Models Learn? New Findings From a Cross Model Layer-Wise Analysis

267

31 Jan 2024

StreamVoice: Streamable Context-Aware Language Modeling for Real-time Zero-Shot Voice ConversionAnnual Meeting of the Association for Computational Linguistics (ACL), 2024

Lei Xie

289

19 Jan 2024

Evaluating Fairness in Self-supervised and Supervised Models for Sequential Data

Sofia Yfantidou

Dimitris Spathis

Marios Constantinides

Athena Vakali

Daniele Quercia

F. Kawsar

317

03 Jan 2024

Self-supervised Pretraining for Robust Personalized Voice Activity Detection in Adverse Conditions

249

27 Dec 2023

Acoustic models of Brazilian Portuguese Speech based on Neural Transformers

M. Gauy

Marcelo Finger

137

14 Dec 2023

Self-Supervised Learning of Spatial Acoustic Representation with Cross-Channel Signal Reconstruction and Multi-Channel ConformerIEEE/ACM Transactions on Audio Speech and Language Processing (TASLP), 2023

Bing Yang

Xiaofei Li

SSL

307

01 Dec 2023

A Quantitative Approach to Understand Self-Supervised Models as Cross-lingual Feature ExtractorsInternational Conference on Natural Language and Speech Processing (ICNLSP), 2023

Xiangyu Zhang

152

27 Nov 2023

Multi-objective Non-intrusive Hearing-aid Speech Assessment Model

Yu Tsao

200

15 Nov 2023

Towards Matching Phones and Speech RepresentationsAutomatic Speech Recognition & Understanding (ASRU), 2023

Gene-Ping Yang

Hao Tang

SSL

195

26 Oct 2023

Self-Supervised Representation Learning for Online Handwriting Text Classification

Pouya Mehralian

Bagher Babaali

Ashena Gorgan Mohammadi

SSL

170

10 Oct 2023

DualVC 2: Dynamic Masked Convolution for Unified Streaming and Non-Streaming Voice ConversionIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023

Ziqian Ning

Yuepeng Jiang

Pengcheng Zhu

Shuai Wang

Jixun Yao

Linfu Xie

Mengxiao Bi

284

27 Sep 2023

Reduce, Reuse, Recycle: Is Perturbed Data better than Other Language augmentation for Low Resource Self-Supervised Speech ModelsInterspeech (Interspeech), 2023

Asad Ullah

Alessandro Ragano

Andrew Hines

416

22 Sep 2023

LeBenchmark 2.0: a Standardized, Replicable and Enhanced Framework for Self-supervised Representations of French SpeechComputer Speech and Language (CSL), 2023

...

256

11 Sep 2023

Understanding Self-Supervised Learning of Speech Representation via Invariance and Redundancy Reduction

257

07 Sep 2023

Acoustic-to-articulatory inversion for dysarthric speech: Are pre-trained self-supervised representations favorable?

Sarthak Kumar Maharana

Krishna Kamal Adidam

Shoumik Nandi

Ajitesh Srivastava

380

03 Sep 2023

Self-Supervised Learning for Audio-Based Emotion Recognition

Peranut Nimitsurachat

Peter Washington

194

23 Jul 2023

Representation Learning With Hidden Unit Clustering For Low Resource Speech ApplicationsIEEE/ACM Transactions on Audio Speech and Language Processing (TASLP), 2023

Varun Krishna

T. Sai

Sriram Ganapathy

SSL

160

14 Jul 2023

On-Device Constrained Self-Supervised Speech Representation Learning for Keyword Spotting via Knowledge DistillationInterspeech (Interspeech), 2023

153

06 Jul 2023

Evaluation of Speech Representations for MOS predictionInternational Conference on Text, Speech and Dialogue (TSD), 2023

F. S. Oliveira

Edresson Casanova

Arnaldo Cândido Júnior

L. Gris

A. S. Soares

A. R. G. Filho

125

16 Jun 2023

Pushing the Limits of Unsupervised Unit Discovery for SSL Speech RepresentationInterspeech (Interspeech), 2023

Xie Chen

153

15 Jun 2023

Feature Normalization for Fine-tuning Self-Supervised Models in Speech EnhancementInterspeech (Interspeech), 2023

Hejung Yang

Hong-Goo Kang

SSL

160

14 Jun 2023

How Generative Spoken Language Modeling Encodes Noisy Speech: Investigation from Phonetics to SyntacticsInterspeech (Interspeech), 2023

Hiroshi Saruwatari

105

01 Jun 2023

Masked Autoencoders with Multi-Window Local-Global Attention Are Better Audio LearnersInternational Conference on Learning Representations (ICLR), 2023

Sarthak Yadav

Sergios Theodoridis

Lars Kai Hansen

Zheng-Hua Tan

250

01 Jun 2023

All Papers

An Unsupervised Autoregressive Model for Speech Representation Learning

Papers citing "An Unsupervised Autoregressive Model for Speech Representation Learning"