Knowledge Distillation from Internal Representations

8 October 2019

Papers citing "Knowledge Distillation from Internal Representations"

23 / 23 papers shown

Title
Knowledge Distillation of Domain-adapted LLMs for Question-Answering in Telecom Rishika Sen Sujoy Roychowdhury Sumit Soman H. G. Ranjani Srikhetra Mohanty 66 0 0 28 Apr 2025
Data Duplication: A Novel Multi-Purpose Attack Paradigm in Machine Unlearning Dayong Ye Tainqing Zhu J. Li Kun Gao B. Liu L. Zhang Wanlei Zhou Y. Zhang AAML MU 80 0 0 28 Jan 2025
Joint Fine-tuning and Conversion of Pretrained Speech and Language Models towards Linear Complexity Mutian He Philip N. Garner 80 0 0 09 Oct 2024
Factual Dialogue Summarization via Learning from Large Language Models Rongxin Zhu Jey Han Lau Jianzhong Qi HILM 48 1 0 20 Jun 2024
CKD: Contrastive Knowledge Distillation from A Sample-wise Perspective Wencheng Zhu Xin Zhou Pengfei Zhu Yu Wang Qinghua Hu VLM 56 1 0 22 Apr 2024
Teacher-Student Architecture for Knowledge Distillation: A Survey Chengming Hu Xuan Li Danyang Liu Haolun Wu Xi Chen Ju Wang Xue Liu 21 16 0 08 Aug 2023
GKD: A General Knowledge Distillation Framework for Large-scale Pre-trained Language Model Shicheng Tan Weng Lam Tam Yuanchun Wang Wenwen Gong Yang Yang ... Jiahao Liu Jingang Wang Shuo Zhao Peng-Zhen Zhang Jie Tang ALM MoE 19 11 0 11 Jun 2023
HomoDistil: Homotopic Task-Agnostic Distillation of Pre-trained Transformers Chen Liang Haoming Jiang Zheng Li Xianfeng Tang Bin Yin Tuo Zhao VLM 24 24 0 19 Feb 2023
Decentralized Learning with Multi-Headed Distillation A. Zhmoginov Mark Sandler Nolan Miller Gus Kristiansen Max Vladymyrov FedML 32 4 0 28 Nov 2022
Compressing Transformer-based self-supervised models for speech processing Tzu-Quan Lin Tsung-Huan Yang Chun-Yao Chang Kuang-Ming Chen Tzu-hsun Feng Hung-yi Lee Hao Tang 32 6 0 17 Nov 2022
Teacher-Student Architecture for Knowledge Learning: A Survey Chengming Hu Xuan Li Dan Liu Xi Chen Ju Wang Xue Liu 20 35 0 28 Oct 2022
GliTr: Glimpse Transformers with Spatiotemporal Consistency for Online Action Prediction Samrudhdhi B. Rangrej Kevin J Liang Tal Hassner James J. Clark 27 3 0 24 Oct 2022
LightHuBERT: Lightweight and Configurable Speech Representation Learning with Once-for-All Hidden-Unit BERT Rui Wang Qibing Bai Junyi Ao Long Zhou Zhixiang Xiong Zhihua Wei Yu Zhang Tom Ko Haizhou Li 28 61 0 29 Mar 2022
Unsupervised Domain Adaptive Person Re-Identification via Human Learning Imitation Yang Peng Ping Liu Yawei Luo Pan Zhou Zichuan Xu Jingen Liu OOD 18 0 0 28 Nov 2021
XtremeDistilTransformers: Task Transfer for Task-agnostic Distillation Subhabrata Mukherjee Ahmed Hassan Awadallah Jianfeng Gao 17 22 0 08 Jun 2021
There is More than Meets the Eye: Self-Supervised Multi-Object Detection and Tracking with Sound by Distilling Multimodal Knowledge Francisco Rivera Valverde Juana Valeria Hurtado Abhinav Valada 26 72 0 01 Mar 2021
MiniLMv2: Multi-Head Self-Attention Relation Distillation for Compressing Pretrained Transformers Wenhui Wang Hangbo Bao Shaohan Huang Li Dong Furu Wei MQ 19 255 0 31 Dec 2020
LRC-BERT: Latent-representation Contrastive Knowledge Distillation for Natural Language Understanding Hao Fu Shaojun Zhou Qihong Yang Junjie Tang Guiquan Liu Kaikui Liu Xiaolong Li 31 57 0 14 Dec 2020
GiBERT: Introducing Linguistic Knowledge into BERT through a Lightweight Gated Injection Method Nicole Peinelt Marek Rei Maria Liakata 22 2 0 23 Oct 2020
Knowledge Distillation: A Survey Jianping Gou B. Yu Stephen J. Maybank Dacheng Tao VLM 19 2,835 0 09 Jun 2020
A Study on Efficiency, Accuracy and Document Structure for Answer Sentence Selection Daniele Bonadiman Alessandro Moschitti RALM 13 10 0 04 Mar 2020
MiniLM: Deep Self-Attention Distillation for Task-Agnostic Compression of Pre-Trained Transformers Wenhui Wang Furu Wei Li Dong Hangbo Bao Nan Yang Ming Zhou VLM 45 1,198 0 25 Feb 2020
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding Alex Jinpeng Wang Amanpreet Singh Julian Michael Felix Hill Omer Levy Samuel R. Bowman ELM 297 6,950 0 20 Apr 2018