ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2210.01351
  4. Cited By
Less is More: Task-aware Layer-wise Distillation for Language Model
  Compression

Less is More: Task-aware Layer-wise Distillation for Language Model Compression

4 October 2022
Chen Liang
Simiao Zuo
Qingru Zhang
Pengcheng He
Weizhu Chen
Tuo Zhao
    VLM
ArXivPDFHTML

Papers citing "Less is More: Task-aware Layer-wise Distillation for Language Model Compression"

16 / 16 papers shown
Title
ABKD: Pursuing a Proper Allocation of the Probability Mass in Knowledge Distillation via $α$-$β$-Divergence
ABKD: Pursuing a Proper Allocation of the Probability Mass in Knowledge Distillation via ααα-βββ-Divergence
Guanghui Wang
Zhiyong Yang
Z. Wang
Shi Wang
Qianqian Xu
Q. Huang
37
0
0
07 May 2025
When Reasoning Meets Compression: Benchmarking Compressed Large Reasoning Models on Complex Reasoning Tasks
When Reasoning Meets Compression: Benchmarking Compressed Large Reasoning Models on Complex Reasoning Tasks
Nan Zhang
Yusen Zhang
Prasenjit Mitra
Rui Zhang
MQ
LRM
46
2
0
02 Apr 2025
WindowKV: Task-Adaptive Group-Wise KV Cache Window Selection for Efficient LLM Inference
WindowKV: Task-Adaptive Group-Wise KV Cache Window Selection for Efficient LLM Inference
Youhui Zuo
Sibo Wei
C. Zhang
Zhuorui Liu
Wenpeng Lu
Dawei Song
VLM
56
0
0
23 Mar 2025
Accelerate 3D Object Detection Models via Zero-Shot Attention Key Pruning
Lizhen Xu
Xiuxiu Bai
Xiaojun Jia
Jianwu Fang
Shanmin Pang
61
0
0
13 Mar 2025
Pastiche Novel Generation Creating: Fan Fiction You Love in Your Favorite Author's Style
Pastiche Novel Generation Creating: Fan Fiction You Love in Your Favorite Author's Style
Xueran Han
Yuhan Liu
Mingzhe Li
W. Liu
Sen Hu
Rui Yan
Zhiqiang Xu
Xiuying Chen
62
0
0
24 Feb 2025
Activating Distributed Visual Region within LLMs for Efficient and Effective Vision-Language Training and Inference
Activating Distributed Visual Region within LLMs for Efficient and Effective Vision-Language Training and Inference
Siyuan Wang
Dianyi Wang
Chengxing Zhou
Zejun Li
Zhihao Fan
Xuanjing Huang
Zhongyu Wei
VLM
105
0
0
17 Dec 2024
Quantifying Knowledge Distillation Using Partial Information Decomposition
Quantifying Knowledge Distillation Using Partial Information Decomposition
Pasan Dissanayake
Faisal Hamman
Barproda Halder
Ilia Sucholutsky
Qiuyi Zhang
Sanghamitra Dutta
36
0
0
12 Nov 2024
Future-Guided Learning: A Predictive Approach To Enhance Time-Series Forecasting
Future-Guided Learning: A Predictive Approach To Enhance Time-Series Forecasting
Skye Gunasekaran
Assel Kembay
Hugo J. Ladret
Rui-Jie Zhu
Laurent Udo Perrinet
Omid Kavehei
Jason Eshraghian
AI4TS
34
0
0
19 Oct 2024
What is the Role of Small Models in the LLM Era: A Survey
What is the Role of Small Models in the LLM Era: A Survey
Lihu Chen
Gaël Varoquaux
ALM
58
23
0
10 Sep 2024
Continual Distillation Learning: Knowledge Distillation in Prompt-based Continual Learning
Continual Distillation Learning: Knowledge Distillation in Prompt-based Continual Learning
Qifan Zhang
Yunhui Guo
Yu Xiang
VLM
CLL
49
0
0
18 Jul 2024
Model Adaptation for Time Constrained Embodied Control
Model Adaptation for Time Constrained Embodied Control
Jaehyun Song
Minjong Yoo
Honguk Woo
35
0
0
17 Jun 2024
Distilling Step-by-Step! Outperforming Larger Language Models with Less
  Training Data and Smaller Model Sizes
Distilling Step-by-Step! Outperforming Larger Language Models with Less Training Data and Smaller Model Sizes
Lokesh Nagalapatti
Chun-Liang Li
Chih-Kuan Yeh
Hootan Nakhost
Yasuhisa Fujii
Alexander Ratner
Ranjay Krishna
Chen-Yu Lee
Tomas Pfister
ALM
204
498
0
03 May 2023
Instruction Tuning with GPT-4
Instruction Tuning with GPT-4
Baolin Peng
Chunyuan Li
Pengcheng He
Michel Galley
Jianfeng Gao
SyDa
ALM
LM&MA
157
576
0
06 Apr 2023
Multitask Prompted Training Enables Zero-Shot Task Generalization
Multitask Prompted Training Enables Zero-Shot Task Generalization
Victor Sanh
Albert Webson
Colin Raffel
Stephen H. Bach
Lintang Sutawika
...
T. Bers
Stella Biderman
Leo Gao
Thomas Wolf
Alexander M. Rush
LRM
205
1,651
0
15 Oct 2021
BERT-of-Theseus: Compressing BERT by Progressive Module Replacing
BERT-of-Theseus: Compressing BERT by Progressive Module Replacing
Canwen Xu
Wangchunshu Zhou
Tao Ge
Furu Wei
Ming Zhou
221
196
0
07 Feb 2020
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language
  Understanding
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding
Alex Jinpeng Wang
Amanpreet Singh
Julian Michael
Felix Hill
Omer Levy
Samuel R. Bowman
ELM
294
6,927
0
20 Apr 2018
1