ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2011.00593
  4. Cited By
MixKD: Towards Efficient Distillation of Large-scale Language Models

MixKD: Towards Efficient Distillation of Large-scale Language Models

1 November 2020
Kevin J Liang
Weituo Hao
Dinghan Shen
Yufan Zhou
Weizhu Chen
Changyou Chen
Lawrence Carin
ArXivPDFHTML

Papers citing "MixKD: Towards Efficient Distillation of Large-scale Language Models"

13 / 13 papers shown
Title
Self-Data Distillation for Recovering Quality in Pruned Large Language Models
Self-Data Distillation for Recovering Quality in Pruned Large Language Models
Vithursan Thangarasa
Ganesh Venkatesh
Mike Lasby
Nish Sinnadurai
Sean Lie
SyDa
33
0
0
13 Oct 2024
HarmAug: Effective Data Augmentation for Knowledge Distillation of Safety Guard Models
HarmAug: Effective Data Augmentation for Knowledge Distillation of Safety Guard Models
Seanie Lee
Haebin Seong
Dong Bok Lee
Minki Kang
Xiaoyin Chen
Dominik Wagner
Yoshua Bengio
Juho Lee
Sung Ju Hwang
65
2
0
02 Oct 2024
LoSparse: Structured Compression of Large Language Models based on
  Low-Rank and Sparse Approximation
LoSparse: Structured Compression of Large Language Models based on Low-Rank and Sparse Approximation
Yixiao Li
Yifan Yu
Qingru Zhang
Chen Liang
Pengcheng He
Weizhu Chen
Tuo Zhao
33
65
0
20 Jun 2023
HomoDistil: Homotopic Task-Agnostic Distillation of Pre-trained
  Transformers
HomoDistil: Homotopic Task-Agnostic Distillation of Pre-trained Transformers
Chen Liang
Haoming Jiang
Zheng Li
Xianfeng Tang
Bin Yin
Tuo Zhao
VLM
24
24
0
19 Feb 2023
Revisiting Intermediate Layer Distillation for Compressing Language
  Models: An Overfitting Perspective
Revisiting Intermediate Layer Distillation for Compressing Language Models: An Overfitting Perspective
Jongwoo Ko
Seungjoon Park
Minchan Jeong
S. Hong
Euijai Ahn
Duhyeuk Chang
Se-Young Yun
21
6
0
03 Feb 2023
Improved Knowledge Distillation for Pre-trained Language Models via
  Knowledge Selection
Improved Knowledge Distillation for Pre-trained Language Models via Knowledge Selection
Chenglong Wang
Yi Lu
Yongyu Mu
Yimin Hu
Tong Xiao
Jingbo Zhu
29
8
0
01 Feb 2023
GliTr: Glimpse Transformers with Spatiotemporal Consistency for Online
  Action Prediction
GliTr: Glimpse Transformers with Spatiotemporal Consistency for Online Action Prediction
Samrudhdhi B. Rangrej
Kevin J Liang
Tal Hassner
James J. Clark
25
3
0
24 Oct 2022
Not to Overfit or Underfit the Source Domains? An Empirical Study of
  Domain Generalization in Question Answering
Not to Overfit or Underfit the Source Domains? An Empirical Study of Domain Generalization in Question Answering
Md Arafat Sultan
Avirup Sil
Radu Florian
OOD
24
6
0
15 May 2022
Ensemble Transformer for Efficient and Accurate Ranking Tasks: an
  Application to Question Answering Systems
Ensemble Transformer for Efficient and Accurate Ranking Tasks: an Application to Question Answering Systems
Yoshitomo Matsubara
Luca Soldaini
Eric Lind
Alessandro Moschitti
21
6
0
15 Jan 2022
R-Drop: Regularized Dropout for Neural Networks
R-Drop: Regularized Dropout for Neural Networks
Xiaobo Liang
Lijun Wu
Juntao Li
Yue Wang
Qi Meng
Tao Qin
Wei Chen
M. Zhang
Tie-Yan Liu
31
424
0
28 Jun 2021
CoDA: Contrast-enhanced and Diversity-promoting Data Augmentation for
  Natural Language Understanding
CoDA: Contrast-enhanced and Diversity-promoting Data Augmentation for Natural Language Understanding
Yanru Qu
Dinghan Shen
Yelong Shen
Sandra Sajeev
Jiawei Han
Weizhu Chen
132
66
0
16 Oct 2020
Adversarial Vertex Mixup: Toward Better Adversarially Robust
  Generalization
Adversarial Vertex Mixup: Toward Better Adversarially Robust Generalization
Saehyung Lee
Hyungyu Lee
Sungroh Yoon
AAML
158
113
0
05 Mar 2020
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language
  Understanding
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding
Alex Jinpeng Wang
Amanpreet Singh
Julian Michael
Felix Hill
Omer Levy
Samuel R. Bowman
ELM
297
6,950
0
20 Apr 2018
1