Weight-Inherited Distillation for Task-Agnostic BERT Compression

Weight-Inherited Distillation for Task-Agnostic BERT Compression

16 May 2023

Yujiu Yang

Papers citing "Weight-Inherited Distillation for Task-Agnostic BERT Compression"

5 / 5 papers shown

Title
Small Language Models: Survey, Measurements, and Insights Zhenyan Lu Xiang Li Dongqi Cai Rongjie Yi Fangming Liu Xiwen Zhang Nicholas D. Lane Mengwei Xu ObjD LRM 47 31 0 24 Sep 2024
Edge-free but Structure-aware: Prototype-Guided Knowledge Distillation from GNNs to MLPs Taiqiang Wu Zhe Zhao Jiahao Wang Xingyu Bai Lei Wang Ngai Wong Yujiu Yang 49 9 0 24 Mar 2023
BERT-of-Theseus: Compressing BERT by Progressive Module Replacing Canwen Xu Wangchunshu Zhou Tao Ge Furu Wei Ming Zhou 210 196 0 07 Feb 2020
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding Alex Jinpeng Wang Amanpreet Singh Julian Michael Felix Hill Omer Levy Samuel R. Bowman ELM 294 6,927 0 20 Apr 2018
Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation Yonghui Wu M. Schuster Z. Chen Quoc V. Le Mohammad Norouzi ... Alex Rudnick Oriol Vinyals G. Corrado Macduff Hughes J. Dean AIMat 716 6,435 0 26 Sep 2016