ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2305.12129
  4. Cited By
Lifting the Curse of Capacity Gap in Distilling Language Models

Lifting the Curse of Capacity Gap in Distilling Language Models

20 May 2023
Chen Zhang
Yang Yang
Jiahao Liu
Jingang Wang
Yunsen Xian
Benyou Wang
Dawei Song
    MoE
ArXivPDFHTML

Papers citing "Lifting the Curse of Capacity Gap in Distilling Language Models"

19 / 19 papers shown
Title
TAID: Temporally Adaptive Interpolated Distillation for Efficient Knowledge Transfer in Language Models
TAID: Temporally Adaptive Interpolated Distillation for Efficient Knowledge Transfer in Language Models
Makoto Shing
Kou Misaki
Han Bao
Sho Yokoi
Takuya Akiba
VLM
57
1
0
28 Jan 2025
MoDification: Mixture of Depths Made Easy
MoDification: Mixture of Depths Made Easy
C. Zhang
M. Zhong
Qimeng Wang
Xuantao Lu
Zheyu Ye
...
Yan Gao
Yao Hu
Kehai Chen
Min Zhang
Dawei Song
VLM
MoE
30
2
0
18 Oct 2024
Don't Throw Away Data: Better Sequence Knowledge Distillation
Don't Throw Away Data: Better Sequence Knowledge Distillation
Jun Wang
Eleftheria Briakou
Hamid Dadkhahi
Rishabh Agarwal
Colin Cherry
Trevor Cohn
36
5
0
15 Jul 2024
Speculative Decoding via Early-exiting for Faster LLM Inference with
  Thompson Sampling Control Mechanism
Speculative Decoding via Early-exiting for Faster LLM Inference with Thompson Sampling Control Mechanism
Jiahao Liu
Qifan Wang
Jingang Wang
Xunliang Cai
20
6
0
06 Jun 2024
PLaD: Preference-based Large Language Model Distillation with
  Pseudo-Preference Pairs
PLaD: Preference-based Large Language Model Distillation with Pseudo-Preference Pairs
Rongzhi Zhang
Jiaming Shen
Tianqi Liu
Haorui Wang
Zhen Qin
Feng Han
Jialu Liu
Simon Baumgartner
Michael Bendersky
Chao Zhang
37
6
0
05 Jun 2024
A Survey on Efficient Inference for Large Language Models
A Survey on Efficient Inference for Large Language Models
Zixuan Zhou
Xuefei Ning
Ke Hong
Tianyu Fu
Jiaming Xu
...
Shengen Yan
Guohao Dai
Xiao-Ping Zhang
Yuhan Dong
Yu-Xiang Wang
46
78
0
22 Apr 2024
Revisiting Knowledge Distillation for Autoregressive Language Models
Revisiting Knowledge Distillation for Autoregressive Language Models
Qihuang Zhong
Liang Ding
Li Shen
Juhua Liu
Bo Du
Dacheng Tao
KELM
39
15
0
19 Feb 2024
MobileVLM : A Fast, Strong and Open Vision Language Assistant for Mobile
  Devices
MobileVLM : A Fast, Strong and Open Vision Language Assistant for Mobile Devices
Xiangxiang Chu
Limeng Qiao
Xinyang Lin
Shuang Xu
Yang Yang
...
Fei Wei
Xinyu Zhang
Bo-Wen Zhang
Xiaolin Wei
Chunhua Shen
MLLM
26
32
0
28 Dec 2023
The Efficiency Spectrum of Large Language Models: An Algorithmic Survey
The Efficiency Spectrum of Large Language Models: An Algorithmic Survey
Tianyu Ding
Tianyi Chen
Haidong Zhu
Jiachen Jiang
Yiqi Zhong
Jinxin Zhou
Guangzhi Wang
Zhihui Zhu
Ilya Zharkov
Luming Liang
25
21
0
01 Dec 2023
Towards the Law of Capacity Gap in Distilling Language Models
Towards the Law of Capacity Gap in Distilling Language Models
Chen Zhang
Dawei Song
Zheyu Ye
Yan Gao
ELM
15
20
0
13 Nov 2023
Norm Tweaking: High-performance Low-bit Quantization of Large Language
  Models
Norm Tweaking: High-performance Low-bit Quantization of Large Language Models
Liang Li
Qingyuan Li
Bo-Wen Zhang
Xiangxiang Chu
MQ
22
28
0
06 Sep 2023
FPTQ: Fine-grained Post-Training Quantization for Large Language Models
FPTQ: Fine-grained Post-Training Quantization for Large Language Models
Qingyuan Li
Yifan Zhang
Liang Li
Peng Yao
Bo-Wen Zhang
Xiangxiang Chu
Yerui Sun
Li-Qiang Du
Yuchen Xie
MQ
29
11
0
30 Aug 2023
Task-agnostic Distillation of Encoder-Decoder Language Models
Task-agnostic Distillation of Encoder-Decoder Language Models
Chen Zhang
Yang Yang
Jingang Wang
Dawei Song
22
3
0
21 May 2023
Weight-Inherited Distillation for Task-Agnostic BERT Compression
Weight-Inherited Distillation for Task-Agnostic BERT Compression
Taiqiang Wu
Cheng-An Hou
Shanshan Lao
Jiayi Li
Ngai Wong
Zhe Zhao
Yujiu Yang
60
10
0
16 May 2023
Distilling Linguistic Context for Language Model Compression
Distilling Linguistic Context for Language Model Compression
Geondo Park
Gyeongman Kim
Eunho Yang
37
37
0
17 Sep 2021
Learning Student-Friendly Teacher Networks for Knowledge Distillation
Learning Student-Friendly Teacher Networks for Knowledge Distillation
D. Park
Moonsu Cha
C. Jeong
Daesin Kim
Bohyung Han
113
99
0
12 Feb 2021
BinaryBERT: Pushing the Limit of BERT Quantization
BinaryBERT: Pushing the Limit of BERT Quantization
Haoli Bai
Wei Zhang
Lu Hou
Lifeng Shang
Jing Jin
Xin Jiang
Qun Liu
Michael Lyu
Irwin King
MQ
138
221
0
31 Dec 2020
Megatron-LM: Training Multi-Billion Parameter Language Models Using
  Model Parallelism
Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism
M. Shoeybi
M. Patwary
Raul Puri
P. LeGresley
Jared Casper
Bryan Catanzaro
MoE
243
1,791
0
17 Sep 2019
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language
  Understanding
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding
Alex Jinpeng Wang
Amanpreet Singh
Julian Michael
Felix Hill
Omer Levy
Samuel R. Bowman
ELM
294
6,927
0
20 Apr 2018
1