Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1908.09355
Cited By
Patient Knowledge Distillation for BERT Model Compression
25 August 2019
S. Sun
Yu Cheng
Zhe Gan
Jingjing Liu
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Patient Knowledge Distillation for BERT Model Compression"
50 / 491 papers shown
Title
Multi-Granularity Semantic Revision for Large Language Model Distillation
Xiaoyu Liu
Yun-feng Zhang
Wei Li
Simiao Li
Xu Huang
Hanting Chen
Yehui Tang
Jie Hu
Zhiwei Xiong
Yunhe Wang
35
1
0
14 Jul 2024
Inference Optimization of Foundation Models on AI Accelerators
Youngsuk Park
Kailash Budhathoki
Liangfu Chen
Jonas M. Kübler
Jiaji Huang
Matthäus Kleindessner
Jun Huan
V. Cevher
Yida Wang
George Karypis
37
3
0
12 Jul 2024
BKDSNN: Enhancing the Performance of Learning-based Spiking Neural Networks Training with Blurred Knowledge Distillation
Zekai Xu
Kang You
Qinghai Guo
Xiang Wang
Zhezhi He
28
3
0
12 Jul 2024
Understanding the Gains from Repeated Self-Distillation
Divyansh Pareek
Simon S. Du
Sewoong Oh
37
3
0
05 Jul 2024
Croppable Knowledge Graph Embedding
Yushan Zhu
Wen Zhang
Zhiqiang Liu
Mingyang Chen
Lei Liang
Huajun Chen
30
0
0
03 Jul 2024
MLKD-BERT: Multi-level Knowledge Distillation for Pre-trained Language Models
Ying Zhang
Ziheng Yang
Shufan Ji
KELM
16
1
0
03 Jul 2024
Survey on Knowledge Distillation for Large Language Models: Methods, Evaluation, and Application
Chuanpeng Yang
Wang Lu
Yao Zhu
Yidong Wang
Qian Chen
Chenlong Gao
Bingjie Yan
Yiqiang Chen
ALM
KELM
44
22
0
02 Jul 2024
FoldGPT: Simple and Effective Large Language Model Compression Scheme
Songwei Liu
Chao Zeng
Lianqiang Li
Chenqian Yan
Lean Fu
Xing Mei
Fangmin Chen
40
4
0
01 Jul 2024
uDistil-Whisper: Label-Free Data Filtering for Knowledge Distillation in Low-Data Regimes
Abdul Waheed
Karima Kadaoui
Bhiksha Raj
Muhammad Abdul-Mageed
32
1
0
01 Jul 2024
Direct Preference Knowledge Distillation for Large Language Models
Yixing Li
Yuxian Gu
Li Dong
Dequan Wang
Yu Cheng
Furu Wei
37
6
0
28 Jun 2024
Dual-Space Knowledge Distillation for Large Language Models
Songming Zhang
Xue Zhang
Zengkui Sun
Yufeng Chen
Jinan Xu
40
5
0
25 Jun 2024
Exploring compressibility of transformer based text-to-music (TTM) models
Vasileios Moschopoulos
Thanasis Kotsiopoulos
Pablo Peso Parada
Konstantinos Nikiforidis
Alexandros Stergiadis
Gerasimos Papakostas
Md. Asif Jalal
Jisi Zhang
Anastasios Drosou
Karthikeyan P. Saravanan
23
0
0
24 Jun 2024
Can Low-Rank Knowledge Distillation in LLMs be Useful for Microelectronic Reasoning?
N. Rouf
Fin Amin
Paul D. Franzon
36
0
0
19 Jun 2024
A Primal-Dual Framework for Transformers and Neural Networks
Tan M. Nguyen
Tam Nguyen
Nhat Ho
Andrea L. Bertozzi
Richard G. Baraniuk
Stanley J. Osher
ViT
21
13
0
19 Jun 2024
Optimized Speculative Sampling for GPU Hardware Accelerators
Dominik Wagner
Seanie Lee
Ilja Baumann
Philipp Seeberger
K. Riedhammer
Tobias Bocklet
38
3
0
16 Jun 2024
3M: Multi-modal Multi-task Multi-teacher Learning for Game Event Detection
Thye Shan Ng
Feiqi Cao
S. Han
21
0
0
13 Jun 2024
MoreauPruner: Robust Pruning of Large Language Models against Weight Perturbations
Zixiao Wang
Jingwei Zhang
Wenqian Zhao
Farzan Farnia
Bei Yu
AAML
30
3
0
11 Jun 2024
VTrans: Accelerating Transformer Compression with Variational Information Bottleneck based Pruning
Oshin Dutta
Ritvik Gupta
Sumeet Agarwal
39
1
0
07 Jun 2024
PLaD: Preference-based Large Language Model Distillation with Pseudo-Preference Pairs
Rongzhi Zhang
Jiaming Shen
Tianqi Liu
Haorui Wang
Zhen Qin
Feng Han
Jialu Liu
Simon Baumgartner
Michael Bendersky
Chao Zhang
37
6
0
05 Jun 2024
STAT: Shrinking Transformers After Training
Megan Flynn
Alexander Wang
Dean Edward Alvarez
Christopher De Sa
Anil Damle
31
2
0
29 May 2024
GOVERN: Gradient Orientation Vote Ensemble for Multi-Teacher Reinforced Distillation
Wenjie Zhou
Zhenxin Ding
Xiaodong Zhang
Haibo Shi
Junfeng Wang
Dawei Yin
41
0
0
06 May 2024
Distilling Reasoning Ability from Large Language Models with Adaptive Thinking
Xiao Chen
Sihang Zhou
K. Liang
Xinwang Liu
ReLM
LRM
29
2
0
14 Apr 2024
CATS: Contextually-Aware Thresholding for Sparsity in Large Language Models
Je-Yong Lee
Donghyun Lee
Genghan Zhang
Mo Tiwari
Azalia Mirhoseini
36
13
0
12 Apr 2024
Efficiently Distilling LLMs for Edge Applications
Achintya Kundu
Fabian Lim
Aaron Chew
L. Wynter
Penny Chong
Rhui Dih Lee
42
6
0
01 Apr 2024
LNPT: Label-free Network Pruning and Training
Jinying Xiao
Ping Li
Zhe Tang
Jie Nie
22
2
0
19 Mar 2024
FBPT: A Fully Binary Point Transformer
Zhixing Hou
Yuzhang Shang
Yan Yan
MQ
25
1
0
15 Mar 2024
Vanilla Transformers are Transfer Capability Teachers
Xin Lu
Yanyan Zhao
Bing Qin
MoE
36
0
0
04 Mar 2024
Align-to-Distill: Trainable Attention Alignment for Knowledge Distillation in Neural Machine Translation
Heegon Jin
Seonil Son
Jemin Park
Youngseok Kim
Hyungjong Noh
Yeonsoo Lee
25
2
0
03 Mar 2024
On the Road to Portability: Compressing End-to-End Motion Planner for Autonomous Driving
Kaituo Feng
Changsheng Li
Dongchun Ren
Ye Yuan
Guoren Wang
29
6
0
02 Mar 2024
Differentially Private Knowledge Distillation via Synthetic Text Generation
James Flemings
Murali Annavaram
SyDa
42
11
0
01 Mar 2024
Sinkhorn Distance Minimization for Knowledge Distillation
Xiao Cui
Yulei Qin
Yuting Gao
Enwei Zhang
Zihan Xu
Tong Wu
Ke Li
Xing Sun
Wen-gang Zhou
Houqiang Li
62
5
0
27 Feb 2024
Knowledge Fusion of Chat LLMs: A Preliminary Technical Report
Fanqi Wan
Ziyi Yang
Longguang Zhong
Xiaojun Quan
Xinting Huang
Wei Bi
MoMe
27
1
0
25 Feb 2024
C
3
C^3
C
3
: Confidence Calibration Model Cascade for Inference-Efficient Cross-Lingual Natural Language Understanding
Taixi Lu
Haoyu Wang
Huajie Shao
Jing Gao
Huaxiu Yao
33
0
0
25 Feb 2024
Divide-or-Conquer? Which Part Should You Distill Your LLM?
Zhuofeng Wu
Richard He Bai
Aonan Zhang
Jiatao Gu
V. Vydiswaran
Navdeep Jaitly
Yizhe Zhang
LRM
32
6
0
22 Feb 2024
Ouroboros: Generating Longer Drafts Phrase by Phrase for Faster Speculative Decoding
Weilin Zhao
Yuxiang Huang
Xu Han
Wang Xu
Chaojun Xiao
Xinrong Zhang
Yewei Fang
Kaihuo Zhang
Zhiyuan Liu
Maosong Sun
35
11
0
21 Feb 2024
A Survey on Knowledge Distillation of Large Language Models
Xiaohan Xu
Ming Li
Chongyang Tao
Tao Shen
Reynold Cheng
Jinyang Li
Can Xu
Dacheng Tao
Tianyi Zhou
KELM
VLM
42
100
0
20 Feb 2024
PromptKD: Distilling Student-Friendly Knowledge for Generative Language Models via Prompt Tuning
Gyeongman Kim
Doohyuk Jang
Eunho Yang
VLM
38
13
0
20 Feb 2024
On Good Practices for Task-Specific Distillation of Large Pretrained Visual Models
Juliette Marrie
Michael Arbel
Julien Mairal
Diane Larlus
VLM
MQ
40
1
0
17 Feb 2024
OneBit: Towards Extremely Low-bit Large Language Models
Yuzhuang Xu
Xu Han
Zonghan Yang
Shuo Wang
Qingfu Zhu
Zhiyuan Liu
Weidong Liu
Wanxiang Che
MQ
51
36
0
17 Feb 2024
Model Compression and Efficient Inference for Large Language Models: A Survey
Wenxiao Wang
Wei Chen
Yicong Luo
Yongliu Long
Zhengkai Lin
Liye Zhang
Binbin Lin
Deng Cai
Xiaofei He
MQ
36
47
0
15 Feb 2024
Graph Inference Acceleration by Learning MLPs on Graphs without Supervision
Zehong Wang
Zheyuan Zhang
Chuxu Zhang
Yanfang Ye
20
5
0
14 Feb 2024
Understanding the Progression of Educational Topics via Semantic Matching
T. Alkhidir
Edmond Awad
Aamena Alshamsi
19
0
0
10 Feb 2024
ViT-MUL: A Baseline Study on Recent Machine Unlearning Methods Applied to Vision Transformers
Ikhyun Cho
Changyeon Park
J. Hockenmaier
23
0
0
07 Feb 2024
DistiLLM: Towards Streamlined Distillation for Large Language Models
Jongwoo Ko
Sungnyun Kim
Tianyi Chen
SeYoung Yun
61
25
0
06 Feb 2024
A Survey on Transformer Compression
Yehui Tang
Yunhe Wang
Jianyuan Guo
Zhijun Tu
Kai Han
Hailin Hu
Dacheng Tao
29
27
0
05 Feb 2024
DE
3
^3
3
-BERT: Distance-Enhanced Early Exiting for BERT based on Prototypical Networks
Jianing He
Qi Zhang
Weiping Ding
Duoqian Miao
Jun Zhao
Liang Hu
LongBing Cao
34
3
0
03 Feb 2024
A Comprehensive Survey of Compression Algorithms for Language Models
Seungcheol Park
Jaehyeon Choi
Sojin Lee
U. Kang
MQ
24
12
0
27 Jan 2024
Progressive Distillation Based on Masked Generation Feature Method for Knowledge Graph Completion
Cunhang Fan
Yujie Chen
Jun Xue
Yonghui Kong
Jianhua Tao
Zhao Lv
20
2
0
19 Jan 2024
Knowledge Fusion of Large Language Models
Fanqi Wan
Xinting Huang
Deng Cai
Xiaojun Quan
Wei Bi
Shuming Shi
MoMe
29
61
0
19 Jan 2024
An Empirical Investigation into the Effect of Parameter Choices in Knowledge Distillation
Md Arafat Sultan
Aashka Trivedi
Parul Awasthy
Avirup Sil
30
0
0
12 Jan 2024
Previous
1
2
3
4
5
...
8
9
10
Next