ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1908.09355
  4. Cited By
Patient Knowledge Distillation for BERT Model Compression

Patient Knowledge Distillation for BERT Model Compression

25 August 2019
S. Sun
Yu Cheng
Zhe Gan
Jingjing Liu
ArXivPDFHTML

Papers citing "Patient Knowledge Distillation for BERT Model Compression"

41 / 491 papers shown
Title
LightPAFF: A Two-Stage Distillation Framework for Pre-training and
  Fine-tuning
LightPAFF: A Two-Stage Distillation Framework for Pre-training and Fine-tuning
Kaitao Song
Hao Sun
Xu Tan
Tao Qin
Jianfeng Lu
Hongzhi Liu
Tie-Yan Liu
12
24
0
27 Apr 2020
Training with Quantization Noise for Extreme Model Compression
Training with Quantization Noise for Extreme Model Compression
Angela Fan
Pierre Stock
Benjamin Graham
Edouard Grave
Remi Gribonval
Hervé Jégou
Armand Joulin
MQ
8
243
0
15 Apr 2020
XtremeDistil: Multi-stage Distillation for Massive Multilingual Models
XtremeDistil: Multi-stage Distillation for Massive Multilingual Models
Subhabrata Mukherjee
Ahmed Hassan Awadallah
8
56
0
12 Apr 2020
LadaBERT: Lightweight Adaptation of BERT through Hybrid Model
  Compression
LadaBERT: Lightweight Adaptation of BERT through Hybrid Model Compression
Yihuan Mao
Yujing Wang
Chufan Wu
Chen Zhang
Yang-Feng Wang
Yaming Yang
Quanlu Zhang
Yunhai Tong
Jing Bai
6
72
0
08 Apr 2020
DynaBERT: Dynamic BERT with Adaptive Width and Depth
DynaBERT: Dynamic BERT with Adaptive Width and Depth
Lu Hou
Zhiqi Huang
Lifeng Shang
Xin Jiang
Xiao Chen
Qun Liu
MQ
4
319
0
08 Apr 2020
Structure-Level Knowledge Distillation For Multilingual Sequence
  Labeling
Structure-Level Knowledge Distillation For Multilingual Sequence Labeling
Xinyu Wang
Yong-jia Jiang
Nguyen Bach
Tao Wang
Fei Huang
Kewei Tu
26
36
0
08 Apr 2020
On the Effect of Dropping Layers of Pre-trained Transformer Models
On the Effect of Dropping Layers of Pre-trained Transformer Models
Hassan Sajjad
Fahim Dalvi
Nadir Durrani
Preslav Nakov
23
132
0
08 Apr 2020
Towards Non-task-specific Distillation of BERT via Sentence
  Representation Approximation
Towards Non-task-specific Distillation of BERT via Sentence Representation Approximation
Bowen Wu
Huan Zhang
Mengyuan Li
Zongsheng Wang
Qihang Feng
Junhong Huang
Baoxun Wang
6
4
0
07 Apr 2020
MobileBERT: a Compact Task-Agnostic BERT for Resource-Limited Devices
MobileBERT: a Compact Task-Agnostic BERT for Resource-Limited Devices
Zhiqing Sun
Hongkun Yu
Xiaodan Song
Renjie Liu
Yiming Yang
Denny Zhou
MQ
8
795
0
06 Apr 2020
FastBERT: a Self-distilling BERT with Adaptive Inference Time
FastBERT: a Self-distilling BERT with Adaptive Inference Time
Weijie Liu
Peng Zhou
Zhe Zhao
Zhiruo Wang
Haotang Deng
Qi Ju
31
354
0
05 Apr 2020
Meta Fine-Tuning Neural Language Models for Multi-Domain Text Mining
Meta Fine-Tuning Neural Language Models for Multi-Domain Text Mining
Chengyu Wang
Minghui Qiu
Jun Huang
Xiaofeng He
AI4CE
26
24
0
29 Mar 2020
Pre-trained Models for Natural Language Processing: A Survey
Pre-trained Models for Natural Language Processing: A Survey
Xipeng Qiu
Tianxiang Sun
Yige Xu
Yunfan Shao
Ning Dai
Xuanjing Huang
LM&MA
VLM
243
1,450
0
18 Mar 2020
A Survey on Contextual Embeddings
A Survey on Contextual Embeddings
Qi Liu
Matt J. Kusner
Phil Blunsom
214
146
0
16 Mar 2020
TRANS-BLSTM: Transformer with Bidirectional LSTM for Language
  Understanding
TRANS-BLSTM: Transformer with Bidirectional LSTM for Language Understanding
Zhiheng Huang
Peng-Tao Xu
Davis Liang
Ajay K. Mishra
Bing Xiang
10
31
0
16 Mar 2020
Distill, Adapt, Distill: Training Small, In-Domain Models for Neural
  Machine Translation
Distill, Adapt, Distill: Training Small, In-Domain Models for Neural Machine Translation
Mitchell A. Gordon
Kevin Duh
CLL
VLM
24
13
0
05 Mar 2020
TextBrewer: An Open-Source Knowledge Distillation Toolkit for Natural
  Language Processing
TextBrewer: An Open-Source Knowledge Distillation Toolkit for Natural Language Processing
Ziqing Yang
Yiming Cui
Zhipeng Chen
Wanxiang Che
Ting Liu
Shijin Wang
Guoping Hu
VLM
12
47
0
28 Feb 2020
A Primer in BERTology: What we know about how BERT works
A Primer in BERTology: What we know about how BERT works
Anna Rogers
Olga Kovaleva
Anna Rumshisky
OffRL
30
1,455
0
27 Feb 2020
Compressing Large-Scale Transformer-Based Models: A Case Study on BERT
Compressing Large-Scale Transformer-Based Models: A Case Study on BERT
Prakhar Ganesh
Yao Chen
Xin Lou
Mohammad Ali Khan
Y. Yang
Hassan Sajjad
Preslav Nakov
Deming Chen
Marianne Winslett
AI4CE
11
197
0
27 Feb 2020
Train Large, Then Compress: Rethinking Model Size for Efficient Training
  and Inference of Transformers
Train Large, Then Compress: Rethinking Model Size for Efficient Training and Inference of Transformers
Zhuohan Li
Eric Wallace
Sheng Shen
Kevin Lin
Kurt Keutzer
Dan Klein
Joseph E. Gonzalez
14
148
0
26 Feb 2020
MiniLM: Deep Self-Attention Distillation for Task-Agnostic Compression
  of Pre-Trained Transformers
MiniLM: Deep Self-Attention Distillation for Task-Agnostic Compression of Pre-Trained Transformers
Wenhui Wang
Furu Wei
Li Dong
Hangbo Bao
Nan Yang
Ming Zhou
VLM
45
1,198
0
25 Feb 2020
Improving BERT Fine-Tuning via Self-Ensemble and Self-Distillation
Improving BERT Fine-Tuning via Self-Ensemble and Self-Distillation
Yige Xu
Xipeng Qiu
L. Zhou
Xuanjing Huang
17
65
0
24 Feb 2020
ScopeIt: Scoping Task Relevant Sentences in Documents
ScopeIt: Scoping Task Relevant Sentences in Documents
Vishwas Suryanarayanan
Barun Patra
P. Bhattacharya
C. Fufa
Charles Lee
17
4
0
23 Feb 2020
TwinBERT: Distilling Knowledge to Twin-Structured BERT Models for
  Efficient Retrieval
TwinBERT: Distilling Knowledge to Twin-Structured BERT Models for Efficient Retrieval
Wenhao Lu
Jian Jiao
Ruofei Zhang
13
50
0
14 Feb 2020
Subclass Distillation
Subclass Distillation
Rafael Müller
Simon Kornblith
Geoffrey E. Hinton
18
33
0
10 Feb 2020
BERT-of-Theseus: Compressing BERT by Progressive Module Replacing
BERT-of-Theseus: Compressing BERT by Progressive Module Replacing
Canwen Xu
Wangchunshu Zhou
Tao Ge
Furu Wei
Ming Zhou
221
197
0
07 Feb 2020
PoWER-BERT: Accelerating BERT Inference via Progressive Word-vector
  Elimination
PoWER-BERT: Accelerating BERT Inference via Progressive Word-vector Elimination
Saurabh Goyal
Anamitra R. Choudhury
Saurabh ManishRaje
Venkatesan T. Chakaravarthy
Yogish Sabharwal
Ashish Verma
6
18
0
24 Jan 2020
AdaBERT: Task-Adaptive BERT Compression with Differentiable Neural
  Architecture Search
AdaBERT: Task-Adaptive BERT Compression with Differentiable Neural Architecture Search
Daoyuan Chen
Yaliang Li
Minghui Qiu
Zhen Wang
Bofang Li
Bolin Ding
Hongbo Deng
Jun Huang
Wei Lin
Jingren Zhou
MQ
16
104
0
13 Jan 2020
The State of Knowledge Distillation for Classification
The State of Knowledge Distillation for Classification
Fabian Ruffy
K. Chahal
16
20
0
20 Dec 2019
WaLDORf: Wasteless Language-model Distillation On Reading-comprehension
WaLDORf: Wasteless Language-model Distillation On Reading-comprehension
J. Tian
A. Kreuzer
Pai-Hung Chen
Hans-Martin Will
VLM
34
3
0
13 Dec 2019
Unsupervised Pre-training for Natural Language Generation: A Literature
  Review
Unsupervised Pre-training for Natural Language Generation: A Literature Review
Yuanxin Liu
Zheng Lin
SSL
AI4CE
25
3
0
13 Nov 2019
Distilling Knowledge Learned in BERT for Text Generation
Distilling Knowledge Learned in BERT for Text Generation
Yen-Chun Chen
Zhe Gan
Yu Cheng
Jingzhou Liu
Jingjing Liu
13
28
0
10 Nov 2019
MKD: a Multi-Task Knowledge Distillation Approach for Pretrained
  Language Models
MKD: a Multi-Task Knowledge Distillation Approach for Pretrained Language Models
Linqing Liu
Haiquan Wang
Jimmy J. Lin
R. Socher
Caiming Xiong
4
21
0
09 Nov 2019
Structured Pruning of Large Language Models
Structured Pruning of Large Language Models
Ziheng Wang
Jeremy Wohlwend
Tao Lei
24
280
0
10 Oct 2019
Knowledge Distillation from Internal Representations
Knowledge Distillation from Internal Representations
Gustavo Aguilar
Yuan Ling
Yu Zhang
Benjamin Yao
Xing Fan
Edward Guo
19
177
0
08 Oct 2019
Distilling BERT into Simple Neural Networks with Unlabeled Transfer Data
Distilling BERT into Simple Neural Networks with Unlabeled Transfer Data
Subhabrata Mukherjee
Ahmed Hassan Awadallah
16
25
0
04 Oct 2019
ALBERT: A Lite BERT for Self-supervised Learning of Language
  Representations
ALBERT: A Lite BERT for Self-supervised Learning of Language Representations
Zhenzhong Lan
Mingda Chen
Sebastian Goodman
Kevin Gimpel
Piyush Sharma
Radu Soricut
SSL
AIMat
53
6,370
0
26 Sep 2019
Extremely Small BERT Models from Mixed-Vocabulary Training
Extremely Small BERT Models from Mixed-Vocabulary Training
Sanqiang Zhao
Raghav Gupta
Yang Song
Denny Zhou
VLM
4
53
0
25 Sep 2019
TinyBERT: Distilling BERT for Natural Language Understanding
TinyBERT: Distilling BERT for Natural Language Understanding
Xiaoqi Jiao
Yichun Yin
Lifeng Shang
Xin Jiang
Xiao Chen
Linlin Li
F. Wang
Qun Liu
VLM
11
1,813
0
23 Sep 2019
DDPNAS: Efficient Neural Architecture Search via Dynamic Distribution
  Pruning
DDPNAS: Efficient Neural Architecture Search via Dynamic Distribution Pruning
Xiawu Zheng
Chenyi Yang
Shaokun Zhang
Yan Wang
Baochang Zhang
Yongjian Wu
Yunsheng Wu
Ling Shao
Rongrong Ji
24
21
0
28 May 2019
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language
  Understanding
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding
Alex Jinpeng Wang
Amanpreet Singh
Julian Michael
Felix Hill
Omer Levy
Samuel R. Bowman
ELM
297
6,950
0
20 Apr 2018
A Survey of Model Compression and Acceleration for Deep Neural Networks
A Survey of Model Compression and Acceleration for Deep Neural Networks
Yu Cheng
Duo Wang
Pan Zhou
Zhang Tao
20
1,087
0
23 Oct 2017
Previous
123...1089