Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2011.00406
Cited By
Non-Autoregressive Predictive Coding for Learning Speech Representations from Local Dependencies
Interspeech (Interspeech), 2020
1 November 2020
Alexander H. Liu
Yu-An Chung
James R. Glass
SSL
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Non-Autoregressive Predictive Coding for Learning Speech Representations from Local Dependencies"
50 / 57 papers shown
Multilingual Dataset Integration Strategies for Robust Audio Deepfake Detection: A SAFE Challenge System
Hashim Ali
Surya Subramani
Lekha Bollinani
Nithin Sai Adupa
Sali El-Loh
Hafiz Malik
182
1
0
28 Aug 2025
EmoSLLM: Parameter-Efficient Adaptation of LLMs for Speech Emotion Recognition
Hugo Thimonier
Antony Perzo
Renaud Seguier
270
3
0
19 Aug 2025
From Faces to Voices: Learning Hierarchical Representations for High-quality Video-to-Speech
Computer Vision and Pattern Recognition (CVPR), 2025
Ji-Hoon Kim
Jeongsoo Choi
Jaehun Kim
Chaeyoung Jung
Joon Son Chung
CVBM
402
7
0
21 Mar 2025
SQ-Whisper: Speaker-Querying based Whisper Model for Target-Speaker ASR
IEEE Transactions on Audio, Speech, and Language Processing (TASLP), 2024
Pengcheng Guo
Xuankai Chang
Hang Lv
Shinji Watanabe
Lei Xie
321
9
0
07 Dec 2024
You Only Speak Once to See
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2024
Wenhao Yang
Jianguo Wei
Wenhuan Lu
Lei Li
VOS
330
6
0
27 Sep 2024
Stimulus Modality Matters: Impact of Perceptual Evaluations from Different Modalities on Speech Emotion Recognition System Performance
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2024
Huang-Cheng Chou
Haibin Wu
Hung-yi Lee
Chi-Chun Lee
622
4
0
16 Sep 2024
Efficient Training of Self-Supervised Speech Foundation Models on a Compute Budget
Spoken Language Technology Workshop (SLT), 2024
Andy T. Liu
Yi-Cheng Lin
Haibin Wu
Stefan Winkler
Hung-yi Lee
441
4
0
09 Sep 2024
Cross-Modal Denoising: A Novel Training Paradigm for Enhancing Speech-Image Retrieval
Interspeech (Interspeech), 2024
Lifeng Zhou
Yuke Li
Rui Deng
Yuting Yang
Haoqi Zhu
269
2
0
15 Aug 2024
Emotion-Aware Speech Self-Supervised Representation Learning with Intensity Knowledge
Interspeech (Interspeech), 2024
Rui Liu
Zening Ma
SSL
396
2
0
10 Jun 2024
A Large-Scale Evaluation of Speech Foundation Models
Shu-Wen Yang
Heng-Jui Chang
Zili Huang
Andy T. Liu
Cheng-I Jeff Lai
...
Kushal Lakhotia
Shang-Wen Li
Abdelrahman Mohamed
Shinji Watanabe
Hung-yi Lee
334
64
0
15 Apr 2024
EMO-SUPERB: An In-depth Look at Speech Emotion Recognition
Haibin Wu
Huang-Cheng Chou
Kai-Wei Chang
Lucas Goncalves
Jiawei Du
Jyh-Shing Roger Jang
Chi-Chun Lee
Hung-Yi Lee
490
22
0
20 Feb 2024
On the Transferability of Large-Scale Self-Supervision to Few-Shot Audio Classification
Calum Heggan
S. Budgett
Timothy M. Hospedales
Mehrdad Yaghoobi
SSL
380
3
0
02 Feb 2024
A Quantitative Approach to Understand Self-Supervised Models as Cross-lingual Feature Extractors
International Conference on Natural Language and Speech Processing (ICNLSP), 2023
Shuyue Stella Li
Beining Xu
Xiangyu Zhang
Hexin Liu
Wen-Han Chao
Leibny Paola García
SSL
243
5
0
27 Nov 2023
Zero-Shot Emotion Transfer For Cross-Lingual Speech Synthesis
Automatic Speech Recognition & Understanding (ASRU), 2023
Yuke Li
Xinfa Zhu
Yinjiao Lei
Hai Li
Junhui Liu
Danming Xie
Lei Xie
302
6
0
06 Oct 2023
Acoustic-to-articulatory inversion for dysarthric speech: Are pre-trained self-supervised representations favorable?
Sarthak Kumar Maharana
Krishna Kamal Adidam
Shoumik Nandi
Ajitesh Srivastava
498
6
0
03 Sep 2023
Unsupervised Active Learning: Optimizing Labeling Cost-Effectiveness for Automatic Speech Recognition
Interspeech (Interspeech), 2023
Zhisheng Zheng
Ziyang Ma
Yu Wang
Xie Chen
238
5
0
28 Aug 2023
Masked Autoencoders with Multi-Window Local-Global Attention Are Better Audio Learners
International Conference on Learning Representations (ICLR), 2023
Sarthak Yadav
Sergios Theodoridis
Lars Kai Hansen
Zheng-Hua Tan
314
15
0
01 Jun 2023
Weakly-Supervised Speech Pre-training: A Case Study on Target Speech Recognition
Interspeech (Interspeech), 2023
Wangyou Zhang
Y. Qian
289
12
0
25 May 2023
Can Self-Supervised Neural Representations Pre-Trained on Human Speech distinguish Animal Callers?
Interspeech (Interspeech), 2023
Eklavya Sarkar
Mathew Magimai.-Doss
296
18
0
23 May 2023
Accommodating Audio Modality in CLIP for Multimodal Processing
AAAI Conference on Artificial Intelligence (AAAI), 2023
Ludan Ruan
Anwen Hu
Yuqing Song
Liang Zhang
S. Zheng
Qin Jin
VLM
230
18
0
12 Mar 2023
AV-data2vec: Self-supervised Learning of Audio-Visual Speech Representations with Contextualized Target Representations
Automatic Speech Recognition & Understanding (ASRU), 2023
Jiachen Lian
Alexei Baevski
Wei-Ning Hsu
Michael Auli
SSL
440
46
0
10 Feb 2023
Dual Learning for Large Vocabulary On-Device ASR
Spoken Language Technology Workshop (SLT), 2023
Cal Peyser
Ronny Huang
Tara N. Sainath
Rohit Prabhavalkar
M. Picheny
K. Cho
SSL
206
1
0
11 Jan 2023
Introducing Semantics into Speech Encoders
Annual Meeting of the Association for Computational Linguistics (ACL), 2022
Derek Xu
Shuyan Dong
Changhan Wang
Suyoun Kim
Mohammad Kachuee
...
Alexei Baevski
Guan-Ting Lin
Hung-yi Lee
Luke Huan
Wei Wang
SSL
198
4
0
15 Nov 2022
Improved acoustic-to-articulatory inversion using representations from pretrained self-supervised learning models
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2022
Sathvik Udupa
Siddarth C
P. Ghosh
237
11
0
30 Oct 2022
Relating Human Perception of Musicality to Prediction in a Predictive Coding Model
Nikolas McNeal
Jennifer Huang
Aniekan Umoren
Shuqi Dai
Roger Dannenberg
R. Randall
T. Lee
159
0
0
29 Oct 2022
SUPERB @ SLT 2022: Challenge on Generalization and Efficiency of Self-Supervised Speech Representation Learning
Spoken Language Technology Workshop (SLT), 2022
Tzu-hsun Feng
Annie Dong
Ching-Feng Yeh
Shu-Wen Yang
Tzu-Quan Lin
...
Xuankai Chang
Shinji Watanabe
Abdel-rahman Mohamed
Shang-Wen Li
Hung-yi Lee
ELM
SSL
323
38
0
16 Oct 2022
On the Utility of Self-supervised Models for Prosody-related Tasks
Spoken Language Technology Workshop (SLT), 2022
Guan-Ting Lin
Chiyu Feng
Wei-Ping Huang
Yuan Tseng
Tzu-Han Lin
Chen-An Li
Hung-yi Lee
Nigel G. Ward
235
65
0
13 Oct 2022
Exploration of A Self-Supervised Speech Model: A Study on Emotional Corpora
Spoken Language Technology Workshop (SLT), 2022
Yuanchao Li
Yumnah Mohamied
P. Bell
Catherine Lai
SSL
405
55
0
05 Oct 2022
SpeechCLIP: Integrating Speech with Pre-Trained Vision and Language Model
Spoken Language Technology Workshop (SLT), 2022
Yi-Jen Shih
Hsuan-Fu Wang
Heng-Jui Chang
Layne Berry
Hung-yi Lee
David Harwath
VLM
CLIP
452
42
0
03 Oct 2022
SpeechLM: Enhanced Speech Pre-Training with Unpaired Textual Data
IEEE/ACM Transactions on Audio Speech and Language Processing (TASLP), 2022
Zi-Hua Zhang
Sanyuan Chen
Long Zhou
Yu Wu
Shuo Ren
...
Zhuoyuan Yao
Xun Gong
Lirong Dai
Jinyu Li
Furu Wei
358
72
0
30 Sep 2022
End-to-End Lyrics Recognition with Self-supervised Learning
Xiangyu Zhang
Shuyue Stella Li
Zhanhong He
R. Togneri
Leibny Paola García
249
0
0
26 Sep 2022
Non-Contrastive Self-supervised Learning for Utterance-Level Information Extraction from Speech
IEEE Journal on Selected Topics in Signal Processing (IEEE JSTSP), 2022
Jaejin Cho
Jesús Villalba
Laureano Moro-Velazquez
Najim Dehak
SSL
250
23
0
10 Aug 2022
A Comparative Study of Self-supervised Speech Representation Based Voice Conversion
IEEE Journal on Selected Topics in Signal Processing (IEEE JSTSP), 2022
Wen-Chin Huang
Shu-Wen Yang
Tomoki Hayashi
Tomoki Toda
222
24
0
10 Jul 2022
Self-Supervised Speech Representation Learning: A Review
IEEE Journal on Selected Topics in Signal Processing (IEEE JSTSP), 2022
Abdel-rahman Mohamed
Hung-yi Lee
Lasse Borgholt
Jakob Drachmann Havtorn
Joakim Edin
...
Shang-Wen Li
Karen Livescu
Lars Maaløe
Tara N. Sainath
Shinji Watanabe
SSL
AI4TS
782
475
0
21 May 2022
SAMU-XLSR: Semantically-Aligned Multimodal Utterance-level Cross-Lingual Speech Representation
IEEE Journal on Selected Topics in Signal Processing (IEEE JSTSP), 2022
Sameer Khurana
Antoine Laurent
James R. Glass
221
47
0
17 May 2022
A Survey on Non-Autoregressive Generation for Neural Machine Translation and Beyond
IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2022
Yisheng Xiao
Lijun Wu
Junliang Guo
Juntao Li
Hao Fei
Tao Qin
Tie-Yan Liu
3DV
MedIm
AI4CE
326
121
0
20 Apr 2022
Autoregressive Co-Training for Learning Discrete Speech Representations
Interspeech (Interspeech), 2022
Sung-Lin Yeh
Hao Tang
SSL
286
8
0
29 Mar 2022
Semi-FedSER: Semi-supervised Learning for Speech Emotion Recognition On Federated Learning using Multiview Pseudo-Labeling
Interspeech (Interspeech), 2022
Tiantian Feng
Shrikanth Narayanan
170
25
0
15 Mar 2022
SUPERB-SG: Enhanced Speech processing Universal PERformance Benchmark for Semantic and Generative Capabilities
Annual Meeting of the Association for Computational Linguistics (ACL), 2022
Hsiang-Sheng Tsai
Heng-Jui Chang
Wen-Chin Huang
Zili Huang
Kushal Lakhotia
...
Hsuan-Jui Chen
Shang-Wen Li
Shinji Watanabe
Abdel-rahman Mohamed
Hung-yi Lee
337
127
0
14 Mar 2022
Audio Self-supervised Learning: A Survey
Patterns (Patterns), 2022
Shuo Liu
Adria Mallol-Ragolta
Emilia Parada-Cabeleiro
Kun Qian
Xingshuo Jing
Alexander Kathan
Bin Hu
Bjoern W. Schuller
SSL
352
136
0
02 Mar 2022
A Brief Overview of Unsupervised Neural Speech Representation Learning
Lasse Borgholt
Jakob Drachmann Havtorn
Joakim Edin
Lars Maaløe
Christian Igel
BDL
AI4TS
SSL
265
13
0
01 Mar 2022
Speaker Normalization for Self-supervised Speech Emotion Recognition
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2022
Itai Gat
Hagai Aronowitz
Weizhong Zhu
E. Morais
R. Hoory
352
62
0
02 Feb 2022
Attribute Inference Attack of Speech Emotion Recognition in Federated Learning Settings
Tiantian Feng
H. Hashemi
Rajat Hebbar
M. Annavaram
Shrikanth S. Narayanan
395
31
0
26 Dec 2021
A Fine-tuned Wav2vec 2.0/HuBERT Benchmark For Speech Emotion Recognition, Speaker Verification and Spoken Language Understanding
Yingzhi Wang
Abdelmoumene Boumadane
A. Heba
415
189
0
04 Nov 2021
WavLM: Large-Scale Self-Supervised Pre-Training for Full Stack Speech Processing
Sanyuan Chen
Chengyi Wang
Zhengyang Chen
Yu-Huan Wu
Shujie Liu
...
Yao Qian
Jian Wu
Micheal Zeng
Xiangzhan Yu
Furu Wei
SSL
1.4K
2,950
0
26 Oct 2021
Word Order Does Not Matter For Speech Recognition
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2021
Vineel Pratap
Qiantong Xu
Tatiana Likhomanenko
Gabriel Synnaeve
R. Collobert
291
4
0
12 Oct 2021
UniSpeech-SAT: Universal Speech Representation Learning with Speaker Aware Pre-Training
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2021
Sanyuan Chen
Yu Wu
Chengyi Wang
Zhengyang Chen
Zhuo Chen
...
Jian Wu
Yao Qian
Furu Wei
Jinyu Li
Xiangzhan Yu
SSL
274
130
0
12 Oct 2021
An Exploration of Self-Supervised Pretrained Representations for End-to-End Speech Recognition
Automatic Speech Recognition & Understanding (ASRU), 2021
Xuankai Chang
Takashi Maekaku
Pengcheng Guo
Jing Shi
Yen-Ju Lu
...
Tianzi Wang
Shu-Wen Yang
Yu Tsao
Hung-yi Lee
Shinji Watanabe
SSL
AI4TS
238
88
0
09 Oct 2021
Mandarin-English Code-switching Speech Recognition with Self-supervised Speech Representation Models
Liang-Hsuan Tseng
Yu-Kuan Fu
Heng-Jui Chang
Hung-yi Lee
SSL
173
18
0
07 Oct 2021
DistilHuBERT: Speech Representation Learning by Layer-wise Distillation of Hidden-unit BERT
Heng-Jui Chang
Shu-Wen Yang
Hung-yi Lee
SSL
825
210
0
05 Oct 2021
1
2
Next
Page 1 of 2