ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2311.07052
  4. Cited By
Towards the Law of Capacity Gap in Distilling Language Models

Towards the Law of Capacity Gap in Distilling Language Models

13 November 2023
Chen Zhang
Dawei Song
Zheyu Ye
Yan Gao
    ELM
ArXivPDFHTML

Papers citing "Towards the Law of Capacity Gap in Distilling Language Models"

19 / 19 papers shown
Title
From Speech to Summary: A Comprehensive Survey of Speech Summarization
From Speech to Summary: A Comprehensive Survey of Speech Summarization
Fabian Retkowski
Maike Züfle
Andreas Sudmann
Dinah Pfau
Jan Niehues
Alexander Waibel
39
0
0
10 Apr 2025
LLMs in Mobile Apps: Practices, Challenges, and Opportunities
LLMs in Mobile Apps: Practices, Challenges, and Opportunities
Kimberly Hau
Safwat Hassan
Shurui Zhou
57
0
0
21 Feb 2025
ZigZagkv: Dynamic KV Cache Compression for Long-context Modeling based
  on Layer Uncertainty
ZigZagkv: Dynamic KV Cache Compression for Long-context Modeling based on Layer Uncertainty
M. Zhong
Xikai Liu
C. Zhang
Yikun Lei
Yan Gao
Yao Hu
Kehai Chen
Min Zhang
70
0
0
12 Dec 2024
FASTNav: Fine-tuned Adaptive Small-language-models Trained for
  Multi-point Robot Navigation
FASTNav: Fine-tuned Adaptive Small-language-models Trained for Multi-point Robot Navigation
Yuxuan Chen
Yixin Han
Xiao Li
64
1
0
20 Nov 2024
MoDification: Mixture of Depths Made Easy
MoDification: Mixture of Depths Made Easy
C. Zhang
M. Zhong
Qimeng Wang
Xuantao Lu
Zheyu Ye
...
Yan Gao
Yao Hu
Kehai Chen
Min Zhang
Dawei Song
VLM
MoE
30
2
0
18 Oct 2024
DDK: Distilling Domain Knowledge for Efficient Large Language Models
DDK: Distilling Domain Knowledge for Efficient Large Language Models
Jiaheng Liu
Chenchen Zhang
Jinyang Guo
Yuanxing Zhang
Haoran Que
...
Congnan Liu
Wenbo Su
Jiamang Wang
Lin Qu
Bo Zheng
43
3
0
23 Jul 2024
Survey on Knowledge Distillation for Large Language Models: Methods,
  Evaluation, and Application
Survey on Knowledge Distillation for Large Language Models: Methods, Evaluation, and Application
Chuanpeng Yang
Wang Lu
Yao Zhu
Yidong Wang
Qian Chen
Chenlong Gao
Bingjie Yan
Yiqiang Chen
ALM
KELM
44
20
0
02 Jul 2024
Understanding the RoPE Extensions of Long-Context LLMs: An Attention
  Perspective
Understanding the RoPE Extensions of Long-Context LLMs: An Attention Perspective
M. Zhong
Chen Zhang
Yikun Lei
Xikai Liu
Yan Gao
Yao Hu
Kehai Chen
Min Zhang
35
5
0
19 Jun 2024
Prompting Large Language Models with Audio for General-Purpose Speech
  Summarization
Prompting Large Language Models with Audio for General-Purpose Speech Summarization
Wonjune Kang
Deb Roy
LRM
16
7
0
10 Jun 2024
Beyond the Speculative Game: A Survey of Speculative Execution in Large
  Language Models
Beyond the Speculative Game: A Survey of Speculative Execution in Large Language Models
Chen Zhang
Zhuorui Liu
Dawei Song
LRM
20
3
0
23 Apr 2024
A Survey on Efficient Inference for Large Language Models
A Survey on Efficient Inference for Large Language Models
Zixuan Zhou
Xuefei Ning
Ke Hong
Tianyu Fu
Jiaming Xu
...
Shengen Yan
Guohao Dai
Xiao-Ping Zhang
Yuhan Dong
Yu-Xiang Wang
46
78
0
22 Apr 2024
OpenBezoar: Small, Cost-Effective and Open Models Trained on Mixes of
  Instruction Data
OpenBezoar: Small, Cost-Effective and Open Models Trained on Mixes of Instruction Data
Chandeepa Dissanayake
Lahiru Lowe
Sachith Gunasekara
Yasiru Ratnayake
MoE
ALM
19
1
0
18 Apr 2024
Model Compression and Efficient Inference for Large Language Models: A
  Survey
Model Compression and Efficient Inference for Large Language Models: A Survey
Wenxiao Wang
Wei Chen
Yicong Luo
Yongliu Long
Zhengkai Lin
Liye Zhang
Binbin Lin
Deng Cai
Xiaofei He
MQ
36
46
0
15 Feb 2024
INTERS: Unlocking the Power of Large Language Models in Search with
  Instruction Tuning
INTERS: Unlocking the Power of Large Language Models in Search with Instruction Tuning
Yutao Zhu
Peitian Zhang
Chenghao Zhang
Yifei Chen
Binyu Xie
Zheng Liu
Ji-Rong Wen
Zhicheng Dou
11
14
0
12 Jan 2024
Task-agnostic Distillation of Encoder-Decoder Language Models
Task-agnostic Distillation of Encoder-Decoder Language Models
Chen Zhang
Yang Yang
Jingang Wang
Dawei Song
22
3
0
21 May 2023
Distilling Step-by-Step! Outperforming Larger Language Models with Less
  Training Data and Smaller Model Sizes
Distilling Step-by-Step! Outperforming Larger Language Models with Less Training Data and Smaller Model Sizes
Lokesh Nagalapatti
Chun-Liang Li
Chih-Kuan Yeh
Hootan Nakhost
Yasuhisa Fujii
Alexander Ratner
Ranjay Krishna
Chen-Yu Lee
Tomas Pfister
ALM
204
498
0
03 May 2023
I-BERT: Integer-only BERT Quantization
I-BERT: Integer-only BERT Quantization
Sehoon Kim
A. Gholami
Z. Yao
Michael W. Mahoney
Kurt Keutzer
MQ
86
332
0
05 Jan 2021
The Pile: An 800GB Dataset of Diverse Text for Language Modeling
The Pile: An 800GB Dataset of Diverse Text for Language Modeling
Leo Gao
Stella Biderman
Sid Black
Laurence Golding
Travis Hoppe
...
Horace He
Anish Thite
Noa Nabeshima
Shawn Presser
Connor Leahy
AIMat
245
1,977
0
31 Dec 2020
Scaling Laws for Neural Language Models
Scaling Laws for Neural Language Models
Jared Kaplan
Sam McCandlish
T. Henighan
Tom B. Brown
B. Chess
R. Child
Scott Gray
Alec Radford
Jeff Wu
Dario Amodei
223
4,424
0
23 Jan 2020
1