ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2011.00406
  4. Cited By
Non-Autoregressive Predictive Coding for Learning Speech Representations
  from Local Dependencies

Non-Autoregressive Predictive Coding for Learning Speech Representations from Local Dependencies

Interspeech (Interspeech), 2020
1 November 2020
Alexander H. Liu
Yu-An Chung
James R. Glass
    SSL
ArXiv (abs)PDFHTML

Papers citing "Non-Autoregressive Predictive Coding for Learning Speech Representations from Local Dependencies"

50 / 57 papers shown
Multilingual Dataset Integration Strategies for Robust Audio Deepfake Detection: A SAFE Challenge System
Multilingual Dataset Integration Strategies for Robust Audio Deepfake Detection: A SAFE Challenge System
Hashim Ali
Surya Subramani
Lekha Bollinani
Nithin Sai Adupa
Sali El-Loh
Hafiz Malik
182
1
0
28 Aug 2025
EmoSLLM: Parameter-Efficient Adaptation of LLMs for Speech Emotion Recognition
EmoSLLM: Parameter-Efficient Adaptation of LLMs for Speech Emotion Recognition
Hugo Thimonier
Antony Perzo
Renaud Seguier
270
3
0
19 Aug 2025
From Faces to Voices: Learning Hierarchical Representations for High-quality Video-to-Speech
From Faces to Voices: Learning Hierarchical Representations for High-quality Video-to-SpeechComputer Vision and Pattern Recognition (CVPR), 2025
Ji-Hoon Kim
Jeongsoo Choi
Jaehun Kim
Chaeyoung Jung
Joon Son Chung
CVBM
402
7
0
21 Mar 2025
SQ-Whisper: Speaker-Querying based Whisper Model for Target-Speaker ASR
SQ-Whisper: Speaker-Querying based Whisper Model for Target-Speaker ASRIEEE Transactions on Audio, Speech, and Language Processing (TASLP), 2024
Pengcheng Guo
Xuankai Chang
Hang Lv
Shinji Watanabe
Lei Xie
321
9
0
07 Dec 2024
You Only Speak Once to See
You Only Speak Once to SeeIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2024
Wenhao Yang
Jianguo Wei
Wenhuan Lu
Lei Li
VOS
330
6
0
27 Sep 2024
Stimulus Modality Matters: Impact of Perceptual Evaluations from Different Modalities on Speech Emotion Recognition System Performance
Stimulus Modality Matters: Impact of Perceptual Evaluations from Different Modalities on Speech Emotion Recognition System PerformanceIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2024
Huang-Cheng Chou
Haibin Wu
Hung-yi Lee
Chi-Chun Lee
622
4
0
16 Sep 2024
Efficient Training of Self-Supervised Speech Foundation Models on a Compute Budget
Efficient Training of Self-Supervised Speech Foundation Models on a Compute BudgetSpoken Language Technology Workshop (SLT), 2024
Andy T. Liu
Yi-Cheng Lin
Haibin Wu
Stefan Winkler
Hung-yi Lee
441
4
0
09 Sep 2024
Cross-Modal Denoising: A Novel Training Paradigm for Enhancing
  Speech-Image Retrieval
Cross-Modal Denoising: A Novel Training Paradigm for Enhancing Speech-Image RetrievalInterspeech (Interspeech), 2024
Lifeng Zhou
Yuke Li
Rui Deng
Yuting Yang
Haoqi Zhu
269
2
0
15 Aug 2024
Emotion-Aware Speech Self-Supervised Representation Learning with
  Intensity Knowledge
Emotion-Aware Speech Self-Supervised Representation Learning with Intensity KnowledgeInterspeech (Interspeech), 2024
Rui Liu
Zening Ma
SSL
396
2
0
10 Jun 2024
A Large-Scale Evaluation of Speech Foundation Models
A Large-Scale Evaluation of Speech Foundation Models
Shu-Wen Yang
Heng-Jui Chang
Zili Huang
Andy T. Liu
Cheng-I Jeff Lai
...
Kushal Lakhotia
Shang-Wen Li
Abdelrahman Mohamed
Shinji Watanabe
Hung-yi Lee
334
64
0
15 Apr 2024
EMO-SUPERB: An In-depth Look at Speech Emotion Recognition
EMO-SUPERB: An In-depth Look at Speech Emotion Recognition
Haibin Wu
Huang-Cheng Chou
Kai-Wei Chang
Lucas Goncalves
Jiawei Du
Jyh-Shing Roger Jang
Chi-Chun Lee
Hung-Yi Lee
490
22
0
20 Feb 2024
On the Transferability of Large-Scale Self-Supervision to Few-Shot Audio
  Classification
On the Transferability of Large-Scale Self-Supervision to Few-Shot Audio Classification
Calum Heggan
S. Budgett
Timothy M. Hospedales
Mehrdad Yaghoobi
SSL
380
3
0
02 Feb 2024
A Quantitative Approach to Understand Self-Supervised Models as
  Cross-lingual Feature Extractors
A Quantitative Approach to Understand Self-Supervised Models as Cross-lingual Feature ExtractorsInternational Conference on Natural Language and Speech Processing (ICNLSP), 2023
Shuyue Stella Li
Beining Xu
Xiangyu Zhang
Hexin Liu
Wen-Han Chao
Leibny Paola García
SSL
243
5
0
27 Nov 2023
Zero-Shot Emotion Transfer For Cross-Lingual Speech Synthesis
Zero-Shot Emotion Transfer For Cross-Lingual Speech SynthesisAutomatic Speech Recognition & Understanding (ASRU), 2023
Yuke Li
Xinfa Zhu
Yinjiao Lei
Hai Li
Junhui Liu
Danming Xie
Lei Xie
302
6
0
06 Oct 2023
Acoustic-to-articulatory inversion for dysarthric speech: Are
  pre-trained self-supervised representations favorable?
Acoustic-to-articulatory inversion for dysarthric speech: Are pre-trained self-supervised representations favorable?
Sarthak Kumar Maharana
Krishna Kamal Adidam
Shoumik Nandi
Ajitesh Srivastava
498
6
0
03 Sep 2023
Unsupervised Active Learning: Optimizing Labeling Cost-Effectiveness for
  Automatic Speech Recognition
Unsupervised Active Learning: Optimizing Labeling Cost-Effectiveness for Automatic Speech RecognitionInterspeech (Interspeech), 2023
Zhisheng Zheng
Ziyang Ma
Yu Wang
Xie Chen
238
5
0
28 Aug 2023
Masked Autoencoders with Multi-Window Local-Global Attention Are Better
  Audio Learners
Masked Autoencoders with Multi-Window Local-Global Attention Are Better Audio LearnersInternational Conference on Learning Representations (ICLR), 2023
Sarthak Yadav
Sergios Theodoridis
Lars Kai Hansen
Zheng-Hua Tan
314
15
0
01 Jun 2023
Weakly-Supervised Speech Pre-training: A Case Study on Target Speech
  Recognition
Weakly-Supervised Speech Pre-training: A Case Study on Target Speech RecognitionInterspeech (Interspeech), 2023
Wangyou Zhang
Y. Qian
289
12
0
25 May 2023
Can Self-Supervised Neural Representations Pre-Trained on Human Speech
  distinguish Animal Callers?
Can Self-Supervised Neural Representations Pre-Trained on Human Speech distinguish Animal Callers?Interspeech (Interspeech), 2023
Eklavya Sarkar
Mathew Magimai.-Doss
296
18
0
23 May 2023
Accommodating Audio Modality in CLIP for Multimodal Processing
Accommodating Audio Modality in CLIP for Multimodal ProcessingAAAI Conference on Artificial Intelligence (AAAI), 2023
Ludan Ruan
Anwen Hu
Yuqing Song
Liang Zhang
S. Zheng
Qin Jin
VLM
230
18
0
12 Mar 2023
AV-data2vec: Self-supervised Learning of Audio-Visual Speech
  Representations with Contextualized Target Representations
AV-data2vec: Self-supervised Learning of Audio-Visual Speech Representations with Contextualized Target RepresentationsAutomatic Speech Recognition & Understanding (ASRU), 2023
Jiachen Lian
Alexei Baevski
Wei-Ning Hsu
Michael Auli
SSL
440
46
0
10 Feb 2023
Dual Learning for Large Vocabulary On-Device ASR
Dual Learning for Large Vocabulary On-Device ASRSpoken Language Technology Workshop (SLT), 2023
Cal Peyser
Ronny Huang
Tara N. Sainath
Rohit Prabhavalkar
M. Picheny
K. Cho
SSL
206
1
0
11 Jan 2023
Introducing Semantics into Speech Encoders
Introducing Semantics into Speech EncodersAnnual Meeting of the Association for Computational Linguistics (ACL), 2022
Derek Xu
Shuyan Dong
Changhan Wang
Suyoun Kim
Mohammad Kachuee
...
Alexei Baevski
Guan-Ting Lin
Hung-yi Lee
Luke Huan
Wei Wang
SSL
198
4
0
15 Nov 2022
Improved acoustic-to-articulatory inversion using representations from
  pretrained self-supervised learning models
Improved acoustic-to-articulatory inversion using representations from pretrained self-supervised learning modelsIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2022
Sathvik Udupa
Siddarth C
P. Ghosh
237
11
0
30 Oct 2022
Relating Human Perception of Musicality to Prediction in a Predictive
  Coding Model
Relating Human Perception of Musicality to Prediction in a Predictive Coding Model
Nikolas McNeal
Jennifer Huang
Aniekan Umoren
Shuqi Dai
Roger Dannenberg
R. Randall
T. Lee
159
0
0
29 Oct 2022
SUPERB @ SLT 2022: Challenge on Generalization and Efficiency of
  Self-Supervised Speech Representation Learning
SUPERB @ SLT 2022: Challenge on Generalization and Efficiency of Self-Supervised Speech Representation LearningSpoken Language Technology Workshop (SLT), 2022
Tzu-hsun Feng
Annie Dong
Ching-Feng Yeh
Shu-Wen Yang
Tzu-Quan Lin
...
Xuankai Chang
Shinji Watanabe
Abdel-rahman Mohamed
Shang-Wen Li
Hung-yi Lee
ELMSSL
323
38
0
16 Oct 2022
On the Utility of Self-supervised Models for Prosody-related Tasks
On the Utility of Self-supervised Models for Prosody-related TasksSpoken Language Technology Workshop (SLT), 2022
Guan-Ting Lin
Chiyu Feng
Wei-Ping Huang
Yuan Tseng
Tzu-Han Lin
Chen-An Li
Hung-yi Lee
Nigel G. Ward
235
65
0
13 Oct 2022
Exploration of A Self-Supervised Speech Model: A Study on Emotional
  Corpora
Exploration of A Self-Supervised Speech Model: A Study on Emotional CorporaSpoken Language Technology Workshop (SLT), 2022
Yuanchao Li
Yumnah Mohamied
P. Bell
Catherine Lai
SSL
405
55
0
05 Oct 2022
SpeechCLIP: Integrating Speech with Pre-Trained Vision and Language
  Model
SpeechCLIP: Integrating Speech with Pre-Trained Vision and Language ModelSpoken Language Technology Workshop (SLT), 2022
Yi-Jen Shih
Hsuan-Fu Wang
Heng-Jui Chang
Layne Berry
Hung-yi Lee
David Harwath
VLMCLIP
452
42
0
03 Oct 2022
SpeechLM: Enhanced Speech Pre-Training with Unpaired Textual Data
SpeechLM: Enhanced Speech Pre-Training with Unpaired Textual DataIEEE/ACM Transactions on Audio Speech and Language Processing (TASLP), 2022
Zi-Hua Zhang
Sanyuan Chen
Long Zhou
Yu Wu
Shuo Ren
...
Zhuoyuan Yao
Xun Gong
Lirong Dai
Jinyu Li
Furu Wei
358
72
0
30 Sep 2022
End-to-End Lyrics Recognition with Self-supervised Learning
End-to-End Lyrics Recognition with Self-supervised Learning
Xiangyu Zhang
Shuyue Stella Li
Zhanhong He
R. Togneri
Leibny Paola García
249
0
0
26 Sep 2022
Non-Contrastive Self-supervised Learning for Utterance-Level Information
  Extraction from Speech
Non-Contrastive Self-supervised Learning for Utterance-Level Information Extraction from SpeechIEEE Journal on Selected Topics in Signal Processing (IEEE JSTSP), 2022
Jaejin Cho
Jesús Villalba
Laureano Moro-Velazquez
Najim Dehak
SSL
250
23
0
10 Aug 2022
A Comparative Study of Self-supervised Speech Representation Based Voice
  Conversion
A Comparative Study of Self-supervised Speech Representation Based Voice ConversionIEEE Journal on Selected Topics in Signal Processing (IEEE JSTSP), 2022
Wen-Chin Huang
Shu-Wen Yang
Tomoki Hayashi
Tomoki Toda
222
24
0
10 Jul 2022
Self-Supervised Speech Representation Learning: A Review
Self-Supervised Speech Representation Learning: A ReviewIEEE Journal on Selected Topics in Signal Processing (IEEE JSTSP), 2022
Abdel-rahman Mohamed
Hung-yi Lee
Lasse Borgholt
Jakob Drachmann Havtorn
Joakim Edin
...
Shang-Wen Li
Karen Livescu
Lars Maaløe
Tara N. Sainath
Shinji Watanabe
SSLAI4TS
782
475
0
21 May 2022
SAMU-XLSR: Semantically-Aligned Multimodal Utterance-level Cross-Lingual
  Speech Representation
SAMU-XLSR: Semantically-Aligned Multimodal Utterance-level Cross-Lingual Speech RepresentationIEEE Journal on Selected Topics in Signal Processing (IEEE JSTSP), 2022
Sameer Khurana
Antoine Laurent
James R. Glass
221
47
0
17 May 2022
A Survey on Non-Autoregressive Generation for Neural Machine Translation
  and Beyond
A Survey on Non-Autoregressive Generation for Neural Machine Translation and BeyondIEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2022
Yisheng Xiao
Lijun Wu
Junliang Guo
Juntao Li
Hao Fei
Tao Qin
Tie-Yan Liu
3DVMedImAI4CE
326
121
0
20 Apr 2022
Autoregressive Co-Training for Learning Discrete Speech Representations
Autoregressive Co-Training for Learning Discrete Speech RepresentationsInterspeech (Interspeech), 2022
Sung-Lin Yeh
Hao Tang
SSL
286
8
0
29 Mar 2022
Semi-FedSER: Semi-supervised Learning for Speech Emotion Recognition On
  Federated Learning using Multiview Pseudo-Labeling
Semi-FedSER: Semi-supervised Learning for Speech Emotion Recognition On Federated Learning using Multiview Pseudo-LabelingInterspeech (Interspeech), 2022
Tiantian Feng
Shrikanth Narayanan
170
25
0
15 Mar 2022
SUPERB-SG: Enhanced Speech processing Universal PERformance Benchmark
  for Semantic and Generative Capabilities
SUPERB-SG: Enhanced Speech processing Universal PERformance Benchmark for Semantic and Generative CapabilitiesAnnual Meeting of the Association for Computational Linguistics (ACL), 2022
Hsiang-Sheng Tsai
Heng-Jui Chang
Wen-Chin Huang
Zili Huang
Kushal Lakhotia
...
Hsuan-Jui Chen
Shang-Wen Li
Shinji Watanabe
Abdel-rahman Mohamed
Hung-yi Lee
337
127
0
14 Mar 2022
Audio Self-supervised Learning: A Survey
Audio Self-supervised Learning: A SurveyPatterns (Patterns), 2022
Shuo Liu
Adria Mallol-Ragolta
Emilia Parada-Cabeleiro
Kun Qian
Xingshuo Jing
Alexander Kathan
Bin Hu
Bjoern W. Schuller
SSL
352
136
0
02 Mar 2022
A Brief Overview of Unsupervised Neural Speech Representation Learning
A Brief Overview of Unsupervised Neural Speech Representation Learning
Lasse Borgholt
Jakob Drachmann Havtorn
Joakim Edin
Lars Maaløe
Christian Igel
BDLAI4TSSSL
265
13
0
01 Mar 2022
Speaker Normalization for Self-supervised Speech Emotion Recognition
Speaker Normalization for Self-supervised Speech Emotion RecognitionIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2022
Itai Gat
Hagai Aronowitz
Weizhong Zhu
E. Morais
R. Hoory
352
62
0
02 Feb 2022
Attribute Inference Attack of Speech Emotion Recognition in Federated
  Learning Settings
Attribute Inference Attack of Speech Emotion Recognition in Federated Learning Settings
Tiantian Feng
H. Hashemi
Rajat Hebbar
M. Annavaram
Shrikanth S. Narayanan
395
31
0
26 Dec 2021
A Fine-tuned Wav2vec 2.0/HuBERT Benchmark For Speech Emotion
  Recognition, Speaker Verification and Spoken Language Understanding
A Fine-tuned Wav2vec 2.0/HuBERT Benchmark For Speech Emotion Recognition, Speaker Verification and Spoken Language Understanding
Yingzhi Wang
Abdelmoumene Boumadane
A. Heba
415
189
0
04 Nov 2021
WavLM: Large-Scale Self-Supervised Pre-Training for Full Stack Speech
  Processing
WavLM: Large-Scale Self-Supervised Pre-Training for Full Stack Speech Processing
Sanyuan Chen
Chengyi Wang
Zhengyang Chen
Yu-Huan Wu
Shujie Liu
...
Yao Qian
Jian Wu
Micheal Zeng
Xiangzhan Yu
Furu Wei
SSL
1.4K
2,950
0
26 Oct 2021
Word Order Does Not Matter For Speech Recognition
Word Order Does Not Matter For Speech RecognitionIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2021
Vineel Pratap
Qiantong Xu
Tatiana Likhomanenko
Gabriel Synnaeve
R. Collobert
291
4
0
12 Oct 2021
UniSpeech-SAT: Universal Speech Representation Learning with Speaker
  Aware Pre-Training
UniSpeech-SAT: Universal Speech Representation Learning with Speaker Aware Pre-TrainingIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2021
Sanyuan Chen
Yu Wu
Chengyi Wang
Zhengyang Chen
Zhuo Chen
...
Jian Wu
Yao Qian
Furu Wei
Jinyu Li
Xiangzhan Yu
SSL
274
130
0
12 Oct 2021
An Exploration of Self-Supervised Pretrained Representations for
  End-to-End Speech Recognition
An Exploration of Self-Supervised Pretrained Representations for End-to-End Speech RecognitionAutomatic Speech Recognition & Understanding (ASRU), 2021
Xuankai Chang
Takashi Maekaku
Pengcheng Guo
Jing Shi
Yen-Ju Lu
...
Tianzi Wang
Shu-Wen Yang
Yu Tsao
Hung-yi Lee
Shinji Watanabe
SSLAI4TS
238
88
0
09 Oct 2021
Mandarin-English Code-switching Speech Recognition with Self-supervised
  Speech Representation Models
Mandarin-English Code-switching Speech Recognition with Self-supervised Speech Representation Models
Liang-Hsuan Tseng
Yu-Kuan Fu
Heng-Jui Chang
Hung-yi Lee
SSL
173
18
0
07 Oct 2021
DistilHuBERT: Speech Representation Learning by Layer-wise Distillation
  of Hidden-unit BERT
DistilHuBERT: Speech Representation Learning by Layer-wise Distillation of Hidden-unit BERT
Heng-Jui Chang
Shu-Wen Yang
Hung-yi Lee
SSL
825
210
0
05 Oct 2021
12
Next
Page 1 of 2