Deep versus Wide: An Analysis of Student Architectures for Task-Agnostic Knowledge Distillation of Self-Supervised Speech Models

14 July 2022

Papers citing "Deep versus Wide: An Analysis of Student Architectures for Task-Agnostic Knowledge Distillation of Self-Supervised Speech Models"

7 / 7 papers shown

Title
Outlier Reduction with Gated Attention for Improved Post-training Quantization in Large Sequence-to-sequence Speech Foundation Models Dominik Wagner Ilja Baumann K. Riedhammer Tobias Bocklet MQ 22 1 0 16 Jun 2024
Sustainable self-supervised learning for speech representations Luis Lugo Valentin Vielzeuf 29 2 0 11 Jun 2024
Efficiency-oriented approaches for self-supervised speech representation learning Luis Lugo Valentin Vielzeuf SSL 19 1 0 18 Dec 2023
Recycle-and-Distill: Universal Compression Strategy for Transformer-based Speech SSL Models with Attention Map Reusing and Masking Distillation Kangwook Jang Sungnyun Kim Se-Young Yun Hoi-Rim Kim 10 5 0 19 May 2023
DinoSR: Self-Distillation and Online Clustering for Self-supervised Speech Representation Learning Alexander H. Liu Heng-Jui Chang Michael Auli Wei-Ning Hsu James R. Glass 13 24 0 17 May 2023
Exploration of Language Dependency for Japanese Self-Supervised Speech Representation Models Takanori Ashihara Takafumi Moriya Kohei Matsuura Tomohiro Tanaka 14 3 0 09 May 2023
Scaling Laws for Neural Language Models Jared Kaplan Sam McCandlish T. Henighan Tom B. Brown B. Chess R. Child Scott Gray Alec Radford Jeff Wu Dario Amodei 220 4,424 0 23 Jan 2020