ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2306.06629
  4. Cited By
GKD: A General Knowledge Distillation Framework for Large-scale
  Pre-trained Language Model

GKD: A General Knowledge Distillation Framework for Large-scale Pre-trained Language Model

11 June 2023
Shicheng Tan
Weng Lam Tam
Yuanchun Wang
Wenwen Gong
Yang Yang
Hongyin Tang
Keqing He
Jiahao Liu
Jingang Wang
Shuo Zhao
Peng-Zhen Zhang
Jie Tang
    ALM
    MoE
ArXivPDFHTML

Papers citing "GKD: A General Knowledge Distillation Framework for Large-scale Pre-trained Language Model"

13 / 13 papers shown
Title
QPruner: Probabilistic Decision Quantization for Structured Pruning in
  Large Language Models
QPruner: Probabilistic Decision Quantization for Structured Pruning in Large Language Models
Changhai Zhou
Yuhua Zhou
Shijie Han
Qian Qiao
Hongguang Li
MQ
77
0
0
16 Dec 2024
Enhancing Knowledge Distillation of Large Language Models through
  Efficient Multi-Modal Distribution Alignment
Enhancing Knowledge Distillation of Large Language Models through Efficient Multi-Modal Distribution Alignment
Tianyu Peng
Jiajun Zhang
29
2
0
19 Sep 2024
RankAdaptor: Hierarchical Dynamic Low-Rank Adaptation for Structural
  Pruned LLMs
RankAdaptor: Hierarchical Dynamic Low-Rank Adaptation for Structural Pruned LLMs
Changhai Zhou
Shijie Han
Shiyang Zhang
Shichao Weng
Zekai Liu
Cheng Jin
24
1
0
22 Jun 2024
Reliable Model Watermarking: Defending Against Theft without
  Compromising on Evasion
Reliable Model Watermarking: Defending Against Theft without Compromising on Evasion
Markus Frey
Sichu Liang
Wentao Hu
Matthias Nau
Ju Jia
Shilin Wang
AAML
28
3
0
21 Apr 2024
Head-wise Shareable Attention for Large Language Models
Head-wise Shareable Attention for Large Language Models
Zouying Cao
Yifei Yang
Hai Zhao
36
4
0
19 Feb 2024
IoT in the Era of Generative AI: Vision and Challenges
IoT in the Era of Generative AI: Vision and Challenges
Xin Wang
Zhongwei Wan
Arvin Hekmati
M. Zong
Samiul Alam
Mi Zhang
Bhaskar Krishnamachari
27
15
0
03 Jan 2024
SiDA-MoE: Sparsity-Inspired Data-Aware Serving for Efficient and
  Scalable Large Mixture-of-Experts Models
SiDA-MoE: Sparsity-Inspired Data-Aware Serving for Efficient and Scalable Large Mixture-of-Experts Models
Zhixu Du
Shiyu Li
Yuhao Wu
Xiangyu Jiang
Jingwei Sun
Qilin Zheng
Yongkai Wu
Ang Li
Hai Helen Li
Yiran Chen
MoE
25
12
0
29 Oct 2023
Causal Distillation for Language Models
Causal Distillation for Language Models
Zhengxuan Wu
Atticus Geiger
J. Rozner
Elisa Kreiss
Hanson Lu
Thomas F. Icard
Christopher Potts
Noah D. Goodman
86
25
0
05 Dec 2021
Distilling Linguistic Context for Language Model Compression
Distilling Linguistic Context for Language Model Compression
Geondo Park
Gyeongman Kim
Eunho Yang
45
37
0
17 Sep 2021
ZeRO-Offload: Democratizing Billion-Scale Model Training
ZeRO-Offload: Democratizing Billion-Scale Model Training
Jie Ren
Samyam Rajbhandari
Reza Yazdani Aminabadi
Olatunji Ruwase
Shuangyang Yang
Minjia Zhang
Dong Li
Yuxiong He
MoE
160
413
0
18 Jan 2021
LRC-BERT: Latent-representation Contrastive Knowledge Distillation for
  Natural Language Understanding
LRC-BERT: Latent-representation Contrastive Knowledge Distillation for Natural Language Understanding
Hao Fu
Shaojun Zhou
Qihong Yang
Junjie Tang
Guiquan Liu
Kaikui Liu
Xiaolong Li
34
57
0
14 Dec 2020
BERT-of-Theseus: Compressing BERT by Progressive Module Replacing
BERT-of-Theseus: Compressing BERT by Progressive Module Replacing
Canwen Xu
Wangchunshu Zhou
Tao Ge
Furu Wei
Ming Zhou
221
197
0
07 Feb 2020
Megatron-LM: Training Multi-Billion Parameter Language Models Using
  Model Parallelism
Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism
M. Shoeybi
M. Patwary
Raul Puri
P. LeGresley
Jared Casper
Bryan Catanzaro
MoE
245
1,817
0
17 Sep 2019
1