ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1908.09355
  4. Cited By
Patient Knowledge Distillation for BERT Model Compression

Patient Knowledge Distillation for BERT Model Compression

25 August 2019
S. Sun
Yu Cheng
Zhe Gan
Jingjing Liu
ArXivPDFHTML

Papers citing "Patient Knowledge Distillation for BERT Model Compression"

50 / 491 papers shown
Title
Gradient-based Intra-attention Pruning on Pre-trained Language Models
Gradient-based Intra-attention Pruning on Pre-trained Language Models
Ziqing Yang
Yiming Cui
Xin Yao
Shijin Wang
VLM
24
8
0
15 Dec 2022
Improving Generalization of Pre-trained Language Models via Stochastic
  Weight Averaging
Improving Generalization of Pre-trained Language Models via Stochastic Weight Averaging
Peng Lu
I. Kobyzev
Mehdi Rezagholizadeh
Ahmad Rashid
A. Ghodsi
Philippe Langlais
MoMe
33
11
0
12 Dec 2022
LEAD: Liberal Feature-based Distillation for Dense Retrieval
LEAD: Liberal Feature-based Distillation for Dense Retrieval
Hao-Lun Sun
Xiao Liu
Yeyun Gong
Anlei Dong
Jing Lu
Yan Zhang
Linjun Yang
Rangan Majumder
Nan Duan
58
2
0
10 Dec 2022
Enhancing Low-Density EEG-Based Brain-Computer Interfaces with
  Similarity-Keeping Knowledge Distillation
Enhancing Low-Density EEG-Based Brain-Computer Interfaces with Similarity-Keeping Knowledge Distillation
Xin Huang
Sung-Yu Chen
Chun-Shu Wei
6
0
0
06 Dec 2022
Vision Transformer Computation and Resilience for Dynamic Inference
Vision Transformer Computation and Resilience for Dynamic Inference
Kavya Sreedhar
Jason Clemons
Rangharajan Venkatesan
S. Keckler
M. Horowitz
24
2
0
06 Dec 2022
Coordinating Cross-modal Distillation for Molecular Property Prediction
Coordinating Cross-modal Distillation for Molecular Property Prediction
Hao Zhang
N. Zhang
Ruixin Zhang
Lei Shen
Yingyi Zhang
Meng Liu
20
1
0
30 Nov 2022
Understanding and Improving Knowledge Distillation for
  Quantization-Aware Training of Large Transformer Encoders
Understanding and Improving Knowledge Distillation for Quantization-Aware Training of Large Transformer Encoders
Minsoo Kim
Sihwa Lee
S. Hong
Duhyeuk Chang
Jungwook Choi
MQ
16
12
0
20 Nov 2022
Fast and Accurate FSA System Using ELBERT: An Efficient and Lightweight
  BERT
Fast and Accurate FSA System Using ELBERT: An Efficient and Lightweight BERT
Siyuan Lu
Chenchen Zhou
Keli Xie
Jun Lin
Zhongfeng Wang
6
1
0
16 Nov 2022
Gradient Knowledge Distillation for Pre-trained Language Models
Gradient Knowledge Distillation for Pre-trained Language Models
Lean Wang
Lei Li
Xu Sun
VLM
23
5
0
02 Nov 2022
Numerical Optimizations for Weighted Low-rank Estimation on Language
  Model
Numerical Optimizations for Weighted Low-rank Estimation on Language Model
Ting Hua
Yen-Chang Hsu
Felicity Wang
Qiang Lou
Yilin Shen
Hongxia Jin
11
13
0
02 Nov 2022
Reduce, Reuse, Recycle: Improving Training Efficiency with Distillation
Reduce, Reuse, Recycle: Improving Training Efficiency with Distillation
Cody Blakeney
Jessica Zosa Forde
Jonathan Frankle
Ziliang Zong
Matthew L. Leavitt
VLM
22
4
0
01 Nov 2022
Empirical Evaluation of Post-Training Quantization Methods for Language
  Tasks
Empirical Evaluation of Post-Training Quantization Methods for Language Tasks
Ting Hu
Christoph Meinel
Haojin Yang
MQ
28
3
0
29 Oct 2022
Teacher-Student Architecture for Knowledge Learning: A Survey
Teacher-Student Architecture for Knowledge Learning: A Survey
Chengming Hu
Xuan Li
Dan Liu
Xi Chen
Ju Wang
Xue Liu
20
35
0
28 Oct 2022
Compressing And Debiasing Vision-Language Pre-Trained Models for Visual
  Question Answering
Compressing And Debiasing Vision-Language Pre-Trained Models for Visual Question Answering
Q. Si
Yuanxin Liu
Zheng Lin
Peng Fu
Weiping Wang
VLM
34
1
0
26 Oct 2022
Legal-Tech Open Diaries: Lesson learned on how to develop and deploy
  light-weight models in the era of humongous Language Models
Legal-Tech Open Diaries: Lesson learned on how to develop and deploy light-weight models in the era of humongous Language Models
Stelios Maroudas
Sotiris Legkas
Prodromos Malakasiotis
Ilias Chalkidis
VLM
AILaw
ALM
ELM
25
4
0
24 Oct 2022
Augmentation with Projection: Towards an Effective and Efficient Data
  Augmentation Paradigm for Distillation
Augmentation with Projection: Towards an Effective and Efficient Data Augmentation Paradigm for Distillation
Ziqi Wang
Yuexin Wu
Frederick Liu
Daogao Liu
Le Hou
Hongkun Yu
Jing Li
Heng Ji
32
5
0
21 Oct 2022
EfficientVLM: Fast and Accurate Vision-Language Models via Knowledge
  Distillation and Modal-adaptive Pruning
EfficientVLM: Fast and Accurate Vision-Language Models via Knowledge Distillation and Modal-adaptive Pruning
Tiannan Wang
Wangchunshu Zhou
Yan Zeng
Xinsong Zhang
VLM
28
36
0
14 Oct 2022
From Mimicking to Integrating: Knowledge Integration for Pre-Trained
  Language Models
From Mimicking to Integrating: Knowledge Integration for Pre-Trained Language Models
Lei Li
Yankai Lin
Xuancheng Ren
Guangxiang Zhao
Peng Li
Jie Zhou
Xu Sun
VLM
11
1
0
11 Oct 2022
A Win-win Deal: Towards Sparse and Robust Pre-trained Language Models
A Win-win Deal: Towards Sparse and Robust Pre-trained Language Models
Yuanxin Liu
Fandong Meng
Zheng Lin
JiangNan Li
Peng Fu
Yanan Cao
Weiping Wang
Jie Zhou
31
4
0
11 Oct 2022
Less is More: Task-aware Layer-wise Distillation for Language Model
  Compression
Less is More: Task-aware Layer-wise Distillation for Language Model Compression
Chen Liang
Simiao Zuo
Qingru Zhang
Pengcheng He
Weizhu Chen
Tuo Zhao
VLM
32
68
0
04 Oct 2022
EffEval: A Comprehensive Evaluation of Efficiency for MT Evaluation
  Metrics
EffEval: A Comprehensive Evaluation of Efficiency for MT Evaluation Metrics
Daniil Larionov
Jens Grunwald
Christoph Leiter
Steffen Eger
15
5
0
20 Sep 2022
Multi-stage Distillation Framework for Cross-Lingual Semantic Similarity
  Matching
Multi-stage Distillation Framework for Cross-Lingual Semantic Similarity Matching
Kunbo Ding
Weijie Liu
Yuejian Fang
Zhe Zhao
Qi Ju
Xuefeng Yang
20
1
0
13 Sep 2022
ViTKD: Practical Guidelines for ViT feature knowledge distillation
ViTKD: Practical Guidelines for ViT feature knowledge distillation
Zhendong Yang
Zhe Li
Ailing Zeng
Zexian Li
Chun Yuan
Yu Li
86
42
0
06 Sep 2022
Combining Compressions for Multiplicative Size Scaling on Natural
  Language Tasks
Combining Compressions for Multiplicative Size Scaling on Natural Language Tasks
Rajiv Movva
Jinhao Lei
Shayne Longpre
Ajay K. Gupta
Chris DuBois
VLM
MQ
28
4
0
20 Aug 2022
Teacher Guided Training: An Efficient Framework for Knowledge Transfer
Teacher Guided Training: An Efficient Framework for Knowledge Transfer
Manzil Zaheer
A. S. Rawat
Seungyeon Kim
Chong You
Himanshu Jain
Andreas Veit
Rob Fergus
Surinder Kumar
VLM
16
2
0
14 Aug 2022
Distributional Correlation--Aware Knowledge Distillation for Stock
  Trading Volume Prediction
Distributional Correlation--Aware Knowledge Distillation for Stock Trading Volume Prediction
Lei Li
Zhiyuan Zhang
Ruihan Bao
Keiko Harimoto
Xu Sun
17
3
0
04 Aug 2022
Building an Efficiency Pipeline: Commutativity and Cumulativeness of
  Efficiency Operators for Transformers
Building an Efficiency Pipeline: Commutativity and Cumulativeness of Efficiency Operators for Transformers
Ji Xin
Raphael Tang
Zhiying Jiang
Yaoliang Yu
Jimmy J. Lin
9
1
0
31 Jul 2022
Efficient model compression with Random Operation Access Specific Tile
  (ROAST) hashing
Efficient model compression with Random Operation Access Specific Tile (ROAST) hashing
Aditya Desai
K. Zhou
Anshumali Shrivastava
6
1
0
21 Jul 2022
Confident Adaptive Language Modeling
Confident Adaptive Language Modeling
Tal Schuster
Adam Fisch
Jai Gupta
Mostafa Dehghani
Dara Bahri
Vinh Q. Tran
Yi Tay
Donald Metzler
43
159
0
14 Jul 2022
Rethinking Attention Mechanism in Time Series Classification
Rethinking Attention Mechanism in Time Series Classification
Bowen Zhao
Huanlai Xing
Xinhan Wang
Fuhong Song
Zhiwen Xiao
AI4TS
28
30
0
14 Jul 2022
Dynamic Contrastive Distillation for Image-Text Retrieval
Dynamic Contrastive Distillation for Image-Text Retrieval
Jun Rao
Liang Ding
Shuhan Qi
Meng Fang
Yang Liu
Liqiong Shen
Dacheng Tao
VLM
51
30
0
04 Jul 2022
Factorizing Knowledge in Neural Networks
Factorizing Knowledge in Neural Networks
Xingyi Yang
Jingwen Ye
Xinchao Wang
MoMe
36
121
0
04 Jul 2022
Language model compression with weighted low-rank factorization
Language model compression with weighted low-rank factorization
Yen-Chang Hsu
Ting Hua
Sung-En Chang
Qiang Lou
Yilin Shen
Hongxia Jin
14
92
0
30 Jun 2022
Knowledge Distillation of Transformer-based Language Models Revisited
Knowledge Distillation of Transformer-based Language Models Revisited
Chengqiang Lu
Jianwei Zhang
Yunfei Chu
Zhengyu Chen
Jingren Zhou
Fei Wu
Haiqing Chen
Hongxia Yang
VLM
25
10
0
29 Jun 2022
cViL: Cross-Lingual Training of Vision-Language Models using Knowledge
  Distillation
cViL: Cross-Lingual Training of Vision-Language Models using Knowledge Distillation
Kshitij Gupta
Devansh Gautam
R. Mamidi
VLM
22
3
0
07 Jun 2022
Recall Distortion in Neural Network Pruning and the Undecayed Pruning
  Algorithm
Recall Distortion in Neural Network Pruning and the Undecayed Pruning Algorithm
Aidan Good
Jia-Huei Lin
Hannah Sieg
Mikey Ferguson
Xin Yu
Shandian Zhe
J. Wieczorek
Thiago Serra
23
11
0
07 Jun 2022
ZeroQuant: Efficient and Affordable Post-Training Quantization for
  Large-Scale Transformers
ZeroQuant: Efficient and Affordable Post-Training Quantization for Large-Scale Transformers
Z. Yao
Reza Yazdani Aminabadi
Minjia Zhang
Xiaoxia Wu
Conglong Li
Yuxiong He
VLM
MQ
45
440
0
04 Jun 2022
Extreme Compression for Pre-trained Transformers Made Simple and
  Efficient
Extreme Compression for Pre-trained Transformers Made Simple and Efficient
Xiaoxia Wu
Z. Yao
Minjia Zhang
Conglong Li
Yuxiong He
MQ
19
31
0
04 Jun 2022
Differentially Private Model Compression
Differentially Private Model Compression
Fatemehsadat Mireshghallah
A. Backurs
Huseyin A. Inan
Lukas Wutschitz
Janardhan Kulkarni
SyDa
11
13
0
03 Jun 2022
Transformer with Fourier Integral Attentions
Transformer with Fourier Integral Attentions
T. Nguyen
Minh Pham
Tam Nguyen
Khai Nguyen
Stanley J. Osher
Nhat Ho
17
4
0
01 Jun 2022
MiniDisc: Minimal Distillation Schedule for Language Model Compression
MiniDisc: Minimal Distillation Schedule for Language Model Compression
Chen Zhang
Yang Yang
Qifan Wang
Jiahao Liu
Jingang Wang
Wei Yu Wu
Dawei Song
47
4
0
29 May 2022
Parameter-Efficient and Student-Friendly Knowledge Distillation
Parameter-Efficient and Student-Friendly Knowledge Distillation
Jun Rao
Xv Meng
Liang Ding
Shuhan Qi
Dacheng Tao
34
46
0
28 May 2022
Train Flat, Then Compress: Sharpness-Aware Minimization Learns More
  Compressible Models
Train Flat, Then Compress: Sharpness-Aware Minimization Learns More Compressible Models
Clara Na
Sanket Vaibhav Mehta
Emma Strubell
62
19
0
25 May 2022
Do we need Label Regularization to Fine-tune Pre-trained Language
  Models?
Do we need Label Regularization to Fine-tune Pre-trained Language Models?
I. Kobyzev
A. Jafari
Mehdi Rezagholizadeh
Tianda Li
Alan Do-Omri
Peng Lu
Pascal Poupart
A. Ghodsi
17
2
0
25 May 2022
Sparse Mixers: Combining MoE and Mixing to build a more efficient BERT
Sparse Mixers: Combining MoE and Mixing to build a more efficient BERT
James Lee-Thorp
Joshua Ainslie
MoE
32
11
0
24 May 2022
Exploring Extreme Parameter Compression for Pre-trained Language Models
Exploring Extreme Parameter Compression for Pre-trained Language Models
Yuxin Ren
Benyou Wang
Lifeng Shang
Xin Jiang
Qun Liu
28
18
0
20 May 2022
Chemical transformer compression for accelerating both training and
  inference of molecular modeling
Chemical transformer compression for accelerating both training and inference of molecular modeling
Yi Yu
K. Börjesson
19
0
0
16 May 2022
Prompting to Distill: Boosting Data-Free Knowledge Distillation via
  Reinforced Prompt
Prompting to Distill: Boosting Data-Free Knowledge Distillation via Reinforced Prompt
Xinyin Ma
Xinchao Wang
Gongfan Fang
Yongliang Shen
Weiming Lu
13
11
0
16 May 2022
Task-specific Compression for Multi-task Language Models using
  Attribution-based Pruning
Task-specific Compression for Multi-task Language Models using Attribution-based Pruning
Nakyeong Yang
Yunah Jang
Hwanhee Lee
Seohyeong Jung
Kyomin Jung
11
8
0
09 May 2022
Knowledge Distillation of Russian Language Models with Reduction of
  Vocabulary
Knowledge Distillation of Russian Language Models with Reduction of Vocabulary
A. Kolesnikova
Yuri Kuratov
Vasily Konovalov
Mikhail Burtsev
VLM
19
10
0
04 May 2022
Previous
123456...8910
Next