Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1908.09355
Cited By
Patient Knowledge Distillation for BERT Model Compression
25 August 2019
S. Sun
Yu Cheng
Zhe Gan
Jingjing Liu
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Patient Knowledge Distillation for BERT Model Compression"
50 / 491 papers shown
Title
ABKD: Pursuing a Proper Allocation of the Probability Mass in Knowledge Distillation via
α
α
α
-
β
β
β
-Divergence
Guanghui Wang
Zhiyong Yang
Z. Wang
Shi Wang
Qianqian Xu
Q. Huang
39
0
0
07 May 2025
KETCHUP: K-Step Return Estimation for Sequential Knowledge Distillation
Jiabin Fan
Guoqing Luo
Michael Bowling
Lili Mou
OffRL
63
0
0
26 Apr 2025
HMI: Hierarchical Knowledge Management for Efficient Multi-Tenant Inference in Pretrained Language Models
J. Zhang
J. Wang
H. Li
Lidan Shou
Ke Chen
Gang Chen
Qin Xie
Guiming Xie
Xuejian Gong
33
0
0
24 Apr 2025
Learning Critically: Selective Self Distillation in Federated Learning on Non-IID Data
Yuting He
Yiqiang Chen
Xiaodong Yang
H. Yu
Yi-Hua Huang
Yang Gu
FedML
51
20
0
20 Apr 2025
A Dual-Space Framework for General Knowledge Distillation of Large Language Models
X. Zhang
Songming Zhang
Yunlong Liang
Fandong Meng
Yufeng Chen
Jinan Xu
Jie Zhou
17
0
0
15 Apr 2025
Multi-Sense Embeddings for Language Models and Knowledge Distillation
Qitong Wang
Mohammed J. Zaki
Georgios Kollias
Vasileios Kalantzis
KELM
26
0
0
08 Apr 2025
Random Conditioning with Distillation for Data-Efficient Diffusion Model Compression
Dohyun Kim
S. Park
Geonhee Han
Seung Wook Kim
Paul Hongsuck Seo
DiffM
50
0
0
02 Apr 2025
Not All LoRA Parameters Are Essential: Insights on Inference Necessity
Guanhua Chen
Yutong Yao
Ci-Jun Gao
Lidia S. Chao
Feng Wan
Derek F. Wong
34
0
0
30 Mar 2025
Delving Deep into Semantic Relation Distillation
Zhaoyi Yan
Kangjun Liu
Qixiang Ye
54
0
0
27 Mar 2025
Siformer: Feature-isolated Transformer for Efficient Skeleton-based Sign Language Recognition
Muxin Pu
Mei Kuan Lim
Chun Yong Chong
SLR
84
0
0
26 Mar 2025
Efficient Knowledge Distillation via Curriculum Extraction
Shivam Gupta
Sushrut Karmalkar
42
0
0
21 Mar 2025
Efficient ANN-Guided Distillation: Aligning Rate-based Features of Spiking Neural Networks through Hybrid Block-wise Replacement
Shu Yang
C. Yu
Lei Liu
Hanzhi Ma
Aili Wang
Erping Li
42
0
0
20 Mar 2025
EPEE: Towards Efficient and Effective Foundation Models in Biomedicine
Zaifu Zhan
Shuang Zhou
Huixue Zhou
Z. Liu
Rui Zhang
37
1
0
03 Mar 2025
CAML: Collaborative Auxiliary Modality Learning for Multi-Agent Systems
Rui Liu
Yu-cui Shen
Peng Gao
Pratap Tokekar
Ming C. Lin
59
0
0
25 Feb 2025
"Actionable Help" in Crises: A Novel Dataset and Resource-Efficient Models for Identifying Request and Offer Social Media Posts
Rabindra Lamsal
M. Read
S. Karunasekera
Muhammad Imran
26
0
0
24 Feb 2025
Every Expert Matters: Towards Effective Knowledge Distillation for Mixture-of-Experts Language Models
Gyeongman Kim
Gyouk Chu
Eunho Yang
MoE
54
0
0
18 Feb 2025
MaintaAvatar: A Maintainable Avatar Based on Neural Radiance Fields by Continual Learning
Shengbo Gu
Yu-Kun Qiu
Yu-Ming Tang
Ancong Wu
Wei-Shi Zheng
38
0
0
04 Feb 2025
BEEM: Boosting Performance of Early Exit DNNs using Multi-Exit Classifiers as Experts
Divya J. Bajpai
M. Hanawal
65
0
0
02 Feb 2025
Merino: Entropy-driven Design for Generative Language Models on IoT Devices
Youpeng Zhao
Ming Lin
Huadong Tang
Qiang Wu
Jun Wang
75
0
0
28 Jan 2025
Multi-stage Training of Bilingual Islamic LLM for Neural Passage Retrieval
Vera Pavlova
48
2
0
20 Jan 2025
CURing Large Models: Compression via CUR Decomposition
Sanghyeon Park
Soo-Mook Moon
38
0
0
08 Jan 2025
Knowledge Distillation with Adapted Weight
Sirong Wu
Xi Luo
Junjie Liu
Yuhui Deng
38
0
0
06 Jan 2025
Training MLPs on Graphs without Supervision
Zehong Wang
Zheyuan Zhang
Chuxu Zhang
Yanfang Ye
70
5
0
05 Dec 2024
Dynamic Self-Distillation via Previous Mini-batches for Fine-tuning Small Language Models
Y. Fu
Yin Yu
Xiaotian Han
Runchao Li
Xianxuan Long
Haotian Yu
Pan Li
SyDa
57
0
0
25 Nov 2024
MAS-Attention: Memory-Aware Stream Processing for Attention Acceleration on Resource-Constrained Edge Devices
Mohammadali Shakerdargah
Shan Lu
Chao Gao
Di Niu
70
0
0
20 Nov 2024
SoftLMs: Efficient Adaptive Low-Rank Approximation of Language Models using Soft-Thresholding Mechanism
Priyansh Bhatnagar
Linfeng Wen
Mingu Kang
34
0
0
15 Nov 2024
Building an Efficient Multilingual Non-Profit IR System for the Islamic Domain Leveraging Multiprocessing Design in Rust
Vera Pavlova
Mohammed Makhlouf
27
1
0
09 Nov 2024
Decoupling Dark Knowledge via Block-wise Logit Distillation for Feature-level Alignment
Chengting Yu
Fengzhao Zhang
Ruizhe Chen
Zuozhu Liu
Shurun Tan
Er-ping Li
Aili Wang
36
2
0
03 Nov 2024
MoE-I
2
^2
2
: Compressing Mixture of Experts Models through Inter-Expert Pruning and Intra-Expert Low-Rank Decomposition
Cheng Yang
Yang Sui
Jinqi Xiao
Lingyi Huang
Yu Gong
Yuanlin Duan
Wenqi Jia
Miao Yin
Yu Cheng
Bo Yuan
MoE
71
4
0
01 Nov 2024
Larger models yield better results? Streamlined severity classification of ADHD-related concerns using BERT-based knowledge distillation
Ahmed Akib Jawad Karim
Kazi Hafiz Md. Asad
Md. Golam Rabiul Alam
AI4MH
40
2
0
30 Oct 2024
SWITCH: Studying with Teacher for Knowledge Distillation of Large Language Models
Jahyun Koo
Yerin Hwang
Yongil Kim
Taegwan Kang
Hyunkyung Bae
Kyomin Jung
49
0
0
25 Oct 2024
A Little Help Goes a Long Way: Efficient LLM Training by Leveraging Small LMs
A. S. Rawat
Veeranjaneyulu Sadhanala
Afshin Rostamizadeh
Ayan Chakrabarti
Wittawat Jitkrittum
...
Rakesh Shivanna
Sashank J. Reddi
A. Menon
Rohan Anil
Sanjiv Kumar
28
2
0
24 Oct 2024
MiniPLM: Knowledge Distillation for Pre-Training Language Models
Yuxian Gu
Hao Zhou
Fandong Meng
Jie Zhou
Minlie Huang
65
5
0
22 Oct 2024
Model Mimic Attack: Knowledge Distillation for Provably Transferable Adversarial Examples
Kirill Lukyanov
Andrew Perminov
D. Turdakov
Mikhail Pautov
AAML
24
0
0
21 Oct 2024
Reducing the Transformer Architecture to a Minimum
Bernhard Bermeitinger
T. Hrycej
Massimo Pavone
Julianus Kath
Siegfried Handschuh
14
0
0
17 Oct 2024
Joint Fine-tuning and Conversion of Pretrained Speech and Language Models towards Linear Complexity
Mutian He
Philip N. Garner
80
0
0
09 Oct 2024
HarmAug: Effective Data Augmentation for Knowledge Distillation of Safety Guard Models
Seanie Lee
Haebin Seong
Dong Bok Lee
Minki Kang
Xiaoyin Chen
Dominik Wagner
Yoshua Bengio
Juho Lee
Sung Ju Hwang
65
2
0
02 Oct 2024
FedPT: Federated Proxy-Tuning of Large Language Models on Resource-Constrained Edge Devices
Zhidong Gao
Yu Zhang
Zhenxiao Zhang
Yanmin Gong
Yuanxiong Guo
18
0
0
01 Oct 2024
General Compression Framework for Efficient Transformer Object Tracking
Lingyi Hong
Jinglun Li
Xinyu Zhou
Shilin Yan
Pinxue Guo
...
Zhaoyu Chen
Shuyong Gao
Wei Zhang
Hong Lu
Wenqiang Zhang
ViT
32
0
0
26 Sep 2024
Exploring and Enhancing the Transfer of Distribution in Knowledge Distillation for Autoregressive Language Models
Jun Rao
Xuebo Liu
Zepeng Lin
Liang Ding
Jing Li
Dacheng Tao
Min Zhang
34
2
0
19 Sep 2024
LLMR: Knowledge Distillation with a Large Language Model-Induced Reward
Dongheng Li
Yongchang Hao
Lili Mou
34
1
0
19 Sep 2024
SDP: Spiking Diffusion Policy for Robotic Manipulation with Learnable Channel-Wise Membrane Thresholds
Zhixing Hou
Maoxu Gao
Hang Yu
Mengyu Yang
Chio-in Ieong
33
1
0
17 Sep 2024
MoDeGPT: Modular Decomposition for Large Language Model Compression
Chi-Heng Lin
Shangqian Gao
James Seale Smith
Abhishek Patel
Shikhar Tuli
Yilin Shen
Hongxia Jin
Yen-Chang Hsu
71
6
0
19 Aug 2024
FuseChat: Knowledge Fusion of Chat Models
Fanqi Wan
Longguang Zhong
Ziyi Yang
Ruijun Chen
Xiaojun Quan
ALM
KELM
MoMe
31
23
0
15 Aug 2024
Using Advanced LLMs to Enhance Smaller LLMs: An Interpretable Knowledge Distillation Approach
Tong Wang
K. Sudhir
Dat Hong
28
1
0
13 Aug 2024
ProFuser: Progressive Fusion of Large Language Models
Tianyuan Shi
Fanqi Wan
Canbin Huang
Xiaojun Quan
Chenliang Li
Ming Yan
Ji Zhang
MoMe
28
2
0
09 Aug 2024
Accelerating Large Language Model Inference with Self-Supervised Early Exits
Florian Valade
LRM
36
1
0
30 Jul 2024
Dataset Distillation for Offline Reinforcement Learning
Jonathan Light
Yuanzhe Liu
Ziniu Hu
DD
31
2
0
29 Jul 2024
LLAVADI: What Matters For Multimodal Large Language Models Distillation
Shilin Xu
Xiangtai Li
Haobo Yuan
Lu Qi
Yunhai Tong
Ming-Hsuan Yang
34
3
0
28 Jul 2024
Reconstruct the Pruned Model without Any Retraining
Pingjie Wang
Ziqing Fan
Shengchao Hu
Zhe Chen
Yanfeng Wang
Yu Wang
38
1
0
18 Jul 2024
1
2
3
4
...
8
9
10
Next