Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2306.06629
Cited By
GKD: A General Knowledge Distillation Framework for Large-scale Pre-trained Language Model
11 June 2023
Shicheng Tan
Weng Lam Tam
Yuanchun Wang
Wenwen Gong
Yang Yang
Hongyin Tang
Keqing He
Jiahao Liu
Jingang Wang
Shuo Zhao
Peng-Zhen Zhang
Jie Tang
ALM
MoE
Re-assign community
ArXiv
PDF
HTML
Papers citing
"GKD: A General Knowledge Distillation Framework for Large-scale Pre-trained Language Model"
13 / 13 papers shown
Title
QPruner: Probabilistic Decision Quantization for Structured Pruning in Large Language Models
Changhai Zhou
Yuhua Zhou
Shijie Han
Qian Qiao
Hongguang Li
MQ
77
0
0
16 Dec 2024
Enhancing Knowledge Distillation of Large Language Models through Efficient Multi-Modal Distribution Alignment
Tianyu Peng
Jiajun Zhang
29
2
0
19 Sep 2024
RankAdaptor: Hierarchical Dynamic Low-Rank Adaptation for Structural Pruned LLMs
Changhai Zhou
Shijie Han
Shiyang Zhang
Shichao Weng
Zekai Liu
Cheng Jin
24
1
0
22 Jun 2024
Reliable Model Watermarking: Defending Against Theft without Compromising on Evasion
Markus Frey
Sichu Liang
Wentao Hu
Matthias Nau
Ju Jia
Shilin Wang
AAML
28
3
0
21 Apr 2024
Head-wise Shareable Attention for Large Language Models
Zouying Cao
Yifei Yang
Hai Zhao
36
4
0
19 Feb 2024
IoT in the Era of Generative AI: Vision and Challenges
Xin Wang
Zhongwei Wan
Arvin Hekmati
M. Zong
Samiul Alam
Mi Zhang
Bhaskar Krishnamachari
27
15
0
03 Jan 2024
SiDA-MoE: Sparsity-Inspired Data-Aware Serving for Efficient and Scalable Large Mixture-of-Experts Models
Zhixu Du
Shiyu Li
Yuhao Wu
Xiangyu Jiang
Jingwei Sun
Qilin Zheng
Yongkai Wu
Ang Li
Hai Helen Li
Yiran Chen
MoE
25
12
0
29 Oct 2023
Causal Distillation for Language Models
Zhengxuan Wu
Atticus Geiger
J. Rozner
Elisa Kreiss
Hanson Lu
Thomas F. Icard
Christopher Potts
Noah D. Goodman
86
25
0
05 Dec 2021
Distilling Linguistic Context for Language Model Compression
Geondo Park
Gyeongman Kim
Eunho Yang
45
37
0
17 Sep 2021
ZeRO-Offload: Democratizing Billion-Scale Model Training
Jie Ren
Samyam Rajbhandari
Reza Yazdani Aminabadi
Olatunji Ruwase
Shuangyang Yang
Minjia Zhang
Dong Li
Yuxiong He
MoE
160
413
0
18 Jan 2021
LRC-BERT: Latent-representation Contrastive Knowledge Distillation for Natural Language Understanding
Hao Fu
Shaojun Zhou
Qihong Yang
Junjie Tang
Guiquan Liu
Kaikui Liu
Xiaolong Li
34
57
0
14 Dec 2020
BERT-of-Theseus: Compressing BERT by Progressive Module Replacing
Canwen Xu
Wangchunshu Zhou
Tao Ge
Furu Wei
Ming Zhou
221
197
0
07 Feb 2020
Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism
M. Shoeybi
M. Patwary
Raul Puri
P. LeGresley
Jared Casper
Bryan Catanzaro
MoE
245
1,817
0
17 Sep 2019
1