Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2211.01071
Cited By
Gradient Knowledge Distillation for Pre-trained Language Models
2 November 2022
Lean Wang
Lei Li
Xu Sun
VLM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Gradient Knowledge Distillation for Pre-trained Language Models"
5 / 5 papers shown
Title
Feature Alignment and Representation Transfer in Knowledge Distillation for Large Language Models
Junjie Yang
Junhao Song
Xudong Han
Ziqian Bi
Tianyang Wang
...
Y. Zhang
Qian Niu
Benji Peng
Keyu Chen
Ming Liu
VLM
40
0
0
18 Apr 2025
Dynamic Self-Distillation via Previous Mini-batches for Fine-tuning Small Language Models
Y. Fu
Yin Yu
Xiaotian Han
Runchao Li
Xianxuan Long
Haotian Yu
Pan Li
SyDa
57
0
0
25 Nov 2024
Efficient Knowledge Distillation: Empowering Small Language Models with Teacher Model Insights
Mohamad Ballout
U. Krumnack
Gunther Heidemann
Kai-Uwe Kühnberger
13
2
0
19 Sep 2024
Indirect Gradient Matching for Adversarial Robust Distillation
Hongsin Lee
Seungju Cho
Changick Kim
AAML
FedML
48
2
0
06 Dec 2023
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding
Alex Jinpeng Wang
Amanpreet Singh
Julian Michael
Felix Hill
Omer Levy
Samuel R. Bowman
ELM
294
6,927
0
20 Apr 2018
1