How to Distill your BERT: An Empirical Study on the Impact of Weight Initialisation and Distillation Objectives

24 May 2023

Hinrich Schütze

Papers citing "How to Distill your BERT: An Empirical Study on the Impact of Weight Initialisation and Distillation Objectives"

3 / 3 papers shown

Title
Why Lift so Heavy? Slimming Large Language Models by Cutting Off the Layers Shuzhou Yuan Ercong Nie Bolei Ma Michael Farber 34 3 0 18 Feb 2024
Neural Architecture Search for Effective Teacher-Student Knowledge Transfer in Language Models Aashka Trivedi Takuma Udagawa Michele Merler Rameswar Panda Yousef El-Kurdi Bishwaranjan Bhattacharjee 22 6 0 16 Mar 2023
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding Alex Jinpeng Wang Amanpreet Singh Julian Michael Felix Hill Omer Levy Samuel R. Bowman ELM 297 6,956 0 20 Apr 2018