DinoSR: Self-Distillation and Online Clustering for Self-supervised Speech Representation Learning

17 May 2023

Papers citing "DinoSR: Self-Distillation and Online Clustering for Self-supervised Speech Representation Learning"

20 / 20 papers shown

Title
Voila: Voice-Language Foundation Models for Real-Time Autonomous Interaction and Voice Role-Play Yemin Shi Yu Shu Siwei Dong Guangyi Liu Jaward Sesay Jingwen Li Zhiting Hu AuLLM VLM 43 0 0 05 May 2025
UniWav: Towards Unified Pre-training for Speech Representation Learning and Generation Alexander H. Liu Sang-gil Lee Chao-Han Huck Yang Yuan Gong Yu-Chun Wang James Glass Rafael Valle Bryan Catanzaro SSL 42 0 0 02 Mar 2025
Multi-Task Corrupted Prediction for Learning Robust Audio-Visual Speech Representation Sungnyun Kim Sungwoo Cho Sangmin Bae Kangwook Jang Se-Young Yun SSL 68 1 0 23 Jan 2025
DC-Spin: A Speaker-invariant Speech Tokenizer for Spoken Language Models Heng-Jui Chang Hongyu Gong Changhan Wang James R. Glass Yu-An Chung 26 0 0 31 Oct 2024
DM-Codec: Distilling Multimodal Representations for Speech Tokenization Md Mubtasim Ahasan Md Fahim Tasnim Mohiuddin A K M Mahbubur Rahman Aman Chadha Tariq Iqbal M. A. Amin Md. Mofijul Islam Amin Ahsan Ali 18 0 0 19 Oct 2024
EH-MAM: Easy-to-Hard Masked Acoustic Modeling for Self-Supervised Speech Representation Learning Ashish Seth Ramaneswaran Selvakumar S. Sakshi Sonal Kumar Sreyan Ghosh Dinesh Manocha 24 0 0 17 Oct 2024
SSR: Alignment-Aware Modality Connector for Speech Language Models Weiting Tan Hirofumi Inaguma Ning Dong Paden Tomasello Xutai Ma 22 3 0 30 Sep 2024
Advanced Clustering Techniques for Speech Signal Enhancement: A Review and Metanalysis of Fuzzy C-Means, K-Means, and Kernel Fuzzy C-Means Methods Abdulhady Abas Abdullah A. Ahmed Tarik Rashid Hadi Veisi Yassin Hussein Rassul B. Hassan Polla Fattah Sabat Abdulhameed Ali Ahmed S. Shamsaldin 19 1 0 28 Sep 2024
SSDM: Scalable Speech Dysfluency Modeling Jiachen Lian Xuanru Zhou Z. Ezzes Jet M J Vonk Brittany Morin D. Baquirin Zachary Mille M. G. Tempini Gopala Anumanchipalli AuLLM 30 1 0 29 Aug 2024
Towards Robust Speech Representation Learning for Thousands of Languages William Chen Wangyou Zhang Yifan Peng Xinjian Li Jinchuan Tian Jiatong Shi Xuankai Chang Soumi Maiti Karen Livescu Shinji Watanabe ELM 33 6 0 30 Jun 2024
Sustainable self-supervised learning for speech representations Luis Lugo Valentin Vielzeuf 29 2 0 11 Jun 2024
The Effect of Batch Size on Contrastive Self-Supervised Speech Representation Learning Nik Vaessen David A. van Leeuwen 20 3 0 21 Feb 2024
Revisiting Self-supervised Learning of Speech Representation from a Mutual Information Perspective Alexander H. Liu Sung-Lin Yeh James R. Glass SSL 11 3 0 16 Jan 2024
Self-supervised Reflective Learning through Self-distillation and Online Clustering for Speaker Representation Learning Danwei Cai Zexin Cai Ming Li 17 0 0 03 Jan 2024
Efficiency-oriented approaches for self-supervised speech representation learning Luis Lugo Valentin Vielzeuf SSL 19 1 0 18 Dec 2023
R-Spin: Efficient Speaker and Noise-invariant Representation Learning with Acoustic Pieces Heng-Jui Chang James R. Glass 15 3 0 15 Nov 2023
Are Soft Prompts Good Zero-shot Learners for Speech Recognition? Dianwen Ng Chong Zhang Ruixi Zhang Yukun Ma Fabian Ritter Gutierrez Trung Hieu Nguyen Chongjia Ni Shengkui Zhao E. Chng B. Ma VLM 19 1 0 18 Sep 2023
CCC-wav2vec 2.0: Clustering aided Cross Contrastive Self-supervised learning of speech representations Vasista Sai Lodagala Sreyan Ghosh S. Umesh SSL 38 18 0 05 Oct 2022
Self-Supervised Speech Representation Learning: A Review Abdel-rahman Mohamed Hung-yi Lee Lasse Borgholt Jakob Drachmann Havtorn Joakim Edin ... Shang-Wen Li Karen Livescu Lars Maaløe Tara N. Sainath Shinji Watanabe SSL AI4TS 124 339 0 21 May 2022
Emerging Properties in Self-Supervised Vision Transformers Mathilde Caron Hugo Touvron Ishan Misra Hervé Jégou Julien Mairal Piotr Bojanowski Armand Joulin 283 5,723 0 29 Apr 2021