Non-Autoregressive Predictive Coding for Learning Speech Representations from Local Dependencies

Interspeech (Interspeech), 2020

1 November 2020

Papers citing "Non-Autoregressive Predictive Coding for Learning Speech Representations from Local Dependencies"

50 / 57 papers shown

Multilingual Dataset Integration Strategies for Robust Audio Deepfake Detection: A SAFE Challenge System

182

28 Aug 2025

EmoSLLM: Parameter-Efficient Adaptation of LLMs for Speech Emotion Recognition

Hugo Thimonier

Antony Perzo

Renaud Seguier

270

19 Aug 2025

From Faces to Voices: Learning Hierarchical Representations for High-quality Video-to-SpeechComputer Vision and Pattern Recognition (CVPR), 2025

402

21 Mar 2025

SQ-Whisper: Speaker-Querying based Whisper Model for Target-Speaker ASRIEEE Transactions on Audio, Speech, and Language Processing (TASLP), 2024

321

07 Dec 2024

You Only Speak Once to SeeIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2024

Wenhao Yang

Jianguo Wei

Wenhuan Lu

Lei Li

VOS

330

27 Sep 2024

Stimulus Modality Matters: Impact of Perceptual Evaluations from Different Modalities on Speech Emotion Recognition System PerformanceIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2024

622

16 Sep 2024

Efficient Training of Self-Supervised Speech Foundation Models on a Compute BudgetSpoken Language Technology Workshop (SLT), 2024

Andy T. Liu

Yi-Cheng Lin

Haibin Wu

Stefan Winkler

Hung-yi Lee

441

09 Sep 2024

Cross-Modal Denoising: A Novel Training Paradigm for Enhancing Speech-Image RetrievalInterspeech (Interspeech), 2024

Yuke Li

269

15 Aug 2024

Emotion-Aware Speech Self-Supervised Representation Learning with Intensity KnowledgeInterspeech (Interspeech), 2024

Rui Liu

Zening Ma

SSL

396

10 Jun 2024

A Large-Scale Evaluation of Speech Foundation Models

...

Shinji Watanabe

Hung-yi Lee

334

15 Apr 2024

EMO-SUPERB: An In-depth Look at Speech Emotion Recognition

Haibin Wu

Jiawei Du

Chi-Chun Lee

Hung-Yi Lee

490

20 Feb 2024

On the Transferability of Large-Scale Self-Supervision to Few-Shot Audio Classification

Calum Heggan

S. Budgett

Timothy M. Hospedales

Mehrdad Yaghoobi

SSL

380

02 Feb 2024

A Quantitative Approach to Understand Self-Supervised Models as Cross-lingual Feature ExtractorsInternational Conference on Natural Language and Speech Processing (ICNLSP), 2023

Xiangyu Zhang

243

27 Nov 2023

Zero-Shot Emotion Transfer For Cross-Lingual Speech SynthesisAutomatic Speech Recognition & Understanding (ASRU), 2023

Lei Xie

302

06 Oct 2023

Acoustic-to-articulatory inversion for dysarthric speech: Are pre-trained self-supervised representations favorable?

Sarthak Kumar Maharana

Krishna Kamal Adidam

Shoumik Nandi

Ajitesh Srivastava

498

03 Sep 2023

Unsupervised Active Learning: Optimizing Labeling Cost-Effectiveness for Automatic Speech RecognitionInterspeech (Interspeech), 2023

Zhisheng Zheng

Ziyang Ma

Yu Wang

Xie Chen

238

28 Aug 2023

Masked Autoencoders with Multi-Window Local-Global Attention Are Better Audio LearnersInternational Conference on Learning Representations (ICLR), 2023

Sarthak Yadav

Sergios Theodoridis

Lars Kai Hansen

Zheng-Hua Tan

314

01 Jun 2023

Weakly-Supervised Speech Pre-training: A Case Study on Target Speech RecognitionInterspeech (Interspeech), 2023

Wangyou Zhang

Y. Qian

289

25 May 2023

Can Self-Supervised Neural Representations Pre-Trained on Human Speech distinguish Animal Callers?Interspeech (Interspeech), 2023

Eklavya Sarkar

Mathew Magimai.-Doss

296

23 May 2023

Accommodating Audio Modality in CLIP for Multimodal ProcessingAAAI Conference on Artificial Intelligence (AAAI), 2023

Qin Jin

230

12 Mar 2023

AV-data2vec: Self-supervised Learning of Audio-Visual Speech Representations with Contextualized Target RepresentationsAutomatic Speech Recognition & Understanding (ASRU), 2023

440

10 Feb 2023

Dual Learning for Large Vocabulary On-Device ASRSpoken Language Technology Workshop (SLT), 2023

206

11 Jan 2023

Introducing Semantics into Speech EncodersAnnual Meeting of the Association for Computational Linguistics (ACL), 2022

...

Guan-Ting Lin

198

15 Nov 2022

Improved acoustic-to-articulatory inversion using representations from pretrained self-supervised learning modelsIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2022

Sathvik Udupa

Siddarth C

P. Ghosh

237

30 Oct 2022

Relating Human Perception of Musicality to Prediction in a Predictive Coding Model

Roger Dannenberg

159

29 Oct 2022

SUPERB @ SLT 2022: Challenge on Generalization and Efficiency of Self-Supervised Speech Representation LearningSpoken Language Technology Workshop (SLT), 2022

Tzu-Quan Lin

...

323

16 Oct 2022

On the Utility of Self-supervised Models for Prosody-related TasksSpoken Language Technology Workshop (SLT), 2022

Guan-Ting Lin

235

13 Oct 2022

Exploration of A Self-Supervised Speech Model: A Study on Emotional CorporaSpoken Language Technology Workshop (SLT), 2022

405

05 Oct 2022

SpeechCLIP: Integrating Speech with Pre-Trained Vision and Language ModelSpoken Language Technology Workshop (SLT), 2022

452

03 Oct 2022

SpeechLM: Enhanced Speech Pre-Training with Unpaired Textual DataIEEE/ACM Transactions on Audio Speech and Language Processing (TASLP), 2022

...

358

30 Sep 2022

End-to-End Lyrics Recognition with Self-supervised Learning

Xiangyu Zhang

249

26 Sep 2022

Non-Contrastive Self-supervised Learning for Utterance-Level Information Extraction from SpeechIEEE Journal on Selected Topics in Signal Processing (IEEE JSTSP), 2022

Jaejin Cho

Jesús Villalba

Laureano Moro-Velazquez

Najim Dehak

SSL

250

10 Aug 2022

A Comparative Study of Self-supervised Speech Representation Based Voice ConversionIEEE Journal on Selected Topics in Signal Processing (IEEE JSTSP), 2022

222

10 Jul 2022

Self-Supervised Speech Representation Learning: A ReviewIEEE Journal on Selected Topics in Signal Processing (IEEE JSTSP), 2022

Abdel-rahman Mohamed

Hung-yi Lee

Lasse Borgholt

Jakob Drachmann Havtorn

...

782

475

21 May 2022

SAMU-XLSR: Semantically-Aligned Multimodal Utterance-level Cross-Lingual Speech RepresentationIEEE Journal on Selected Topics in Signal Processing (IEEE JSTSP), 2022

Sameer Khurana

Antoine Laurent

James R. Glass

221

17 May 2022

A Survey on Non-Autoregressive Generation for Neural Machine Translation and BeyondIEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2022

Junliang Guo

326

121

20 Apr 2022

Autoregressive Co-Training for Learning Discrete Speech RepresentationsInterspeech (Interspeech), 2022

Sung-Lin Yeh

Hao Tang

SSL

286

29 Mar 2022

Semi-FedSER: Semi-supervised Learning for Speech Emotion Recognition On Federated Learning using Multiview Pseudo-LabelingInterspeech (Interspeech), 2022

Tiantian Feng

Shrikanth Narayanan

170

15 Mar 2022

SUPERB-SG: Enhanced Speech processing Universal PERformance Benchmark for Semantic and Generative CapabilitiesAnnual Meeting of the Association for Computational Linguistics (ACL), 2022

...

337

127

14 Mar 2022

Audio Self-supervised Learning: A SurveyPatterns (Patterns), 2022

Shuo Liu

Adria Mallol-Ragolta

Emilia Parada-Cabeleiro

Kun Qian

Bjoern W. Schuller

352

136

02 Mar 2022

A Brief Overview of Unsupervised Neural Speech Representation Learning

Lasse Borgholt

Jakob Drachmann Havtorn

265

01 Mar 2022

Speaker Normalization for Self-supervised Speech Emotion RecognitionIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2022

352

02 Feb 2022

Attribute Inference Attack of Speech Emotion Recognition in Federated Learning Settings

Shrikanth S. Narayanan

395

26 Dec 2021

A Fine-tuned Wav2vec 2.0/HuBERT Benchmark For Speech Emotion Recognition, Speaker Verification and Spoken Language Understanding

Yingzhi Wang

Abdelmoumene Boumadane

A. Heba

415

189

04 Nov 2021

WavLM: Large-Scale Self-Supervised Pre-Training for Full Stack Speech Processing

...

Jian Wu

1.4K

2,950

26 Oct 2021

Word Order Does Not Matter For Speech RecognitionIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2021

291

12 Oct 2021

UniSpeech-SAT: Universal Speech Representation Learning with Speaker Aware Pre-TrainingIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2021

...

Jian Wu

274

130

12 Oct 2021

An Exploration of Self-Supervised Pretrained Representations for End-to-End Speech RecognitionAutomatic Speech Recognition & Understanding (ASRU), 2021

...

Tianzi Wang

238

09 Oct 2021

Mandarin-English Code-switching Speech Recognition with Self-supervised Speech Representation Models

173

07 Oct 2021

DistilHuBERT: Speech Representation Learning by Layer-wise Distillation of Hidden-unit BERT

825

210

05 Oct 2021