ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1908.09355
  4. Cited By
Patient Knowledge Distillation for BERT Model Compression

Patient Knowledge Distillation for BERT Model Compression

25 August 2019
S. Sun
Yu Cheng
Zhe Gan
Jingjing Liu
ArXivPDFHTML

Papers citing "Patient Knowledge Distillation for BERT Model Compression"

50 / 491 papers shown
Title
Learning Light-Weight Translation Models from Deep Transformer
Learning Light-Weight Translation Models from Deep Transformer
Bei Li
Ziyang Wang
Hui Liu
Quan Du
Tong Xiao
Chunliang Zhang
Jingbo Zhu
VLM
112
40
0
27 Dec 2020
A Survey on Visual Transformer
A Survey on Visual Transformer
Kai Han
Yunhe Wang
Hanting Chen
Xinghao Chen
Jianyuan Guo
...
Chunjing Xu
Yixing Xu
Zhaohui Yang
Yiman Zhang
Dacheng Tao
ViT
18
2,123
0
23 Dec 2020
Undivided Attention: Are Intermediate Layers Necessary for BERT?
Undivided Attention: Are Intermediate Layers Necessary for BERT?
S. N. Sridhar
Anthony Sarah
14
14
0
22 Dec 2020
Wasserstein Contrastive Representation Distillation
Wasserstein Contrastive Representation Distillation
Liqun Chen
Dong Wang
Zhe Gan
Jingjing Liu
Ricardo Henao
Lawrence Carin
15
93
0
15 Dec 2020
Parameter-Efficient Transfer Learning with Diff Pruning
Parameter-Efficient Transfer Learning with Diff Pruning
Demi Guo
Alexander M. Rush
Yoon Kim
9
382
0
14 Dec 2020
LRC-BERT: Latent-representation Contrastive Knowledge Distillation for
  Natural Language Understanding
LRC-BERT: Latent-representation Contrastive Knowledge Distillation for Natural Language Understanding
Hao Fu
Shaojun Zhou
Qihong Yang
Junjie Tang
Guiquan Liu
Kaikui Liu
Xiaolong Li
27
57
0
14 Dec 2020
Reinforced Multi-Teacher Selection for Knowledge Distillation
Reinforced Multi-Teacher Selection for Knowledge Distillation
Fei Yuan
Linjun Shou
J. Pei
Wutao Lin
Ming Gong
Yan Fu
Daxin Jiang
8
121
0
11 Dec 2020
Improving Task-Agnostic BERT Distillation with Layer Mapping Search
Improving Task-Agnostic BERT Distillation with Layer Mapping Search
Xiaoqi Jiao
Huating Chang
Yichun Yin
Lifeng Shang
Xin Jiang
Xiao Chen
Linlin Li
Fang Wang
Qun Liu
21
12
0
11 Dec 2020
Meta-KD: A Meta Knowledge Distillation Framework for Language Model
  Compression across Domains
Meta-KD: A Meta Knowledge Distillation Framework for Language Model Compression across Domains
Haojie Pan
Chengyu Wang
Minghui Qiu
Yichang Zhang
Yaliang Li
Jun Huang
15
49
0
02 Dec 2020
EasyTransfer -- A Simple and Scalable Deep Transfer Learning Platform
  for NLP Applications
EasyTransfer -- A Simple and Scalable Deep Transfer Learning Platform for NLP Applications
Minghui Qiu
Peng Li
Chengyu Wang
Hanjie Pan
Yaliang Li
...
Jun Yang
Yaliang Li
Jun Huang
Deng Cai
Wei Lin
VLM
SyDa
28
20
0
18 Nov 2020
Know What You Don't Need: Single-Shot Meta-Pruning for Attention Heads
Know What You Don't Need: Single-Shot Meta-Pruning for Attention Heads
Zhengyan Zhang
Fanchao Qi
Zhiyuan Liu
Qun Liu
Maosong Sun
VLM
28
30
0
07 Nov 2020
Sound Natural: Content Rephrasing in Dialog Systems
Sound Natural: Content Rephrasing in Dialog Systems
Arash Einolghozati
Anchit Gupta
K. Diedrick
S. Gupta
18
6
0
03 Nov 2020
MixKD: Towards Efficient Distillation of Large-scale Language Models
MixKD: Towards Efficient Distillation of Large-scale Language Models
Kevin J Liang
Weituo Hao
Dinghan Shen
Yufan Zhou
Weizhu Chen
Changyou Chen
Lawrence Carin
8
72
0
01 Nov 2020
Improved Synthetic Training for Reading Comprehension
Improved Synthetic Training for Reading Comprehension
Yanda Chen
Md Arafat Sultan
T. J. W. R. Center
SyDa
14
5
0
24 Oct 2020
Optimal Subarchitecture Extraction For BERT
Optimal Subarchitecture Extraction For BERT
Adrian de Wynter
Daniel J. Perry
MQ
43
18
0
20 Oct 2020
BERT2DNN: BERT Distillation with Massive Unlabeled Data for Online
  E-Commerce Search
BERT2DNN: BERT Distillation with Massive Unlabeled Data for Online E-Commerce Search
Yunjiang Jiang
Yue Shang
Ziyang Liu
Hongwei Shen
Yun Xiao
Wei Xiong
Sulong Xu
Weipeng P. Yan
Di Jin
24
17
0
20 Oct 2020
AutoADR: Automatic Model Design for Ad Relevance
AutoADR: Automatic Model Design for Ad Relevance
Yiren Chen
Yaming Yang
Hong Sun
Yujing Wang
Yu Xu
Wei Shen
Rong Zhou
Yunhai Tong
Jing Bai
Ruofei Zhang
34
3
0
14 Oct 2020
Length-Adaptive Transformer: Train Once with Length Drop, Use Anytime
  with Search
Length-Adaptive Transformer: Train Once with Length Drop, Use Anytime with Search
Gyuwan Kim
Kyunghyun Cho
29
92
0
14 Oct 2020
Weight Squeezing: Reparameterization for Knowledge Transfer and Model
  Compression
Weight Squeezing: Reparameterization for Knowledge Transfer and Model Compression
Artem Chumachenko
Daniil Gavrilov
Nikita Balagansky
Pavel Kalaidin
8
0
0
14 Oct 2020
Pretrained Transformers for Text Ranking: BERT and Beyond
Pretrained Transformers for Text Ranking: BERT and Beyond
Jimmy J. Lin
Rodrigo Nogueira
Andrew Yates
VLM
219
608
0
13 Oct 2020
BERT-EMD: Many-to-Many Layer Mapping for BERT Compression with Earth
  Mover's Distance
BERT-EMD: Many-to-Many Layer Mapping for BERT Compression with Earth Mover's Distance
Jianquan Li
Xiaokang Liu
Honghong Zhao
Ruifeng Xu
Min Yang
Yaohong Jin
10
54
0
13 Oct 2020
Load What You Need: Smaller Versions of Multilingual BERT
Load What You Need: Smaller Versions of Multilingual BERT
Amine Abdaoui
Camille Pradel
Grégoire Sigel
39
72
0
12 Oct 2020
Adversarial Self-Supervised Data-Free Distillation for Text
  Classification
Adversarial Self-Supervised Data-Free Distillation for Text Classification
Xinyin Ma
Yongliang Shen
Gongfan Fang
Chen Chen
Chenghao Jia
Weiming Lu
20
24
0
10 Oct 2020
Deep Learning Meets Projective Clustering
Deep Learning Meets Projective Clustering
Alaa Maalouf
Harry Lang
Daniela Rus
Dan Feldman
8
9
0
08 Oct 2020
Why Skip If You Can Combine: A Simple Knowledge Distillation Technique
  for Intermediate Layers
Why Skip If You Can Combine: A Simple Knowledge Distillation Technique for Intermediate Layers
Yimeng Wu
Peyman Passban
Mehdi Rezagholizade
Qun Liu
MoE
10
34
0
06 Oct 2020
Regularizing Dialogue Generation by Imitating Implicit Scenarios
Regularizing Dialogue Generation by Imitating Implicit Scenarios
Shaoxiong Feng
Xuancheng Ren
Hongshen Chen
Bin Sun
Kan Li
Xu Sun
18
20
0
05 Oct 2020
Pruning Redundant Mappings in Transformer Models via Spectral-Normalized
  Identity Prior
Pruning Redundant Mappings in Transformer Models via Spectral-Normalized Identity Prior
Zi Lin
Jeremiah Zhe Liu
Ziao Yang
Nan Hua
Dan Roth
20
46
0
05 Oct 2020
Which *BERT? A Survey Organizing Contextualized Encoders
Which *BERT? A Survey Organizing Contextualized Encoders
Patrick Xia
Shijie Wu
Benjamin Van Durme
26
50
0
02 Oct 2020
Pea-KD: Parameter-efficient and Accurate Knowledge Distillation on BERT
Pea-KD: Parameter-efficient and Accurate Knowledge Distillation on BERT
Ikhyun Cho
U. Kang
4
1
0
30 Sep 2020
Contrastive Distillation on Intermediate Representations for Language
  Model Compression
Contrastive Distillation on Intermediate Representations for Language Model Compression
S. Sun
Zhe Gan
Yu Cheng
Yuwei Fang
Shuohang Wang
Jingjing Liu
VLM
12
68
0
29 Sep 2020
TernaryBERT: Distillation-aware Ultra-low Bit BERT
TernaryBERT: Distillation-aware Ultra-low Bit BERT
Wei Zhang
Lu Hou
Yichun Yin
Lifeng Shang
Xiao Chen
Xin Jiang
Qun Liu
MQ
12
208
0
27 Sep 2020
Efficient Transformer-based Large Scale Language Representations using
  Hardware-friendly Block Structured Pruning
Efficient Transformer-based Large Scale Language Representations using Hardware-friendly Block Structured Pruning
Bingbing Li
Zhenglun Kong
Tianyun Zhang
Ji Li
Z. Li
Hang Liu
Caiwen Ding
VLM
24
64
0
17 Sep 2020
Simplified TinyBERT: Knowledge Distillation for Document Retrieval
Simplified TinyBERT: Knowledge Distillation for Document Retrieval
Xuanang Chen
Ben He
Kai Hui
Le Sun
Yingfei Sun
17
25
0
16 Sep 2020
Mimic and Conquer: Heterogeneous Tree Structure Distillation for
  Syntactic NLP
Mimic and Conquer: Heterogeneous Tree Structure Distillation for Syntactic NLP
Hao Fei
Yafeng Ren
Donghong Ji
11
24
0
16 Sep 2020
DualDE: Dually Distilling Knowledge Graph Embedding for Faster and
  Cheaper Reasoning
DualDE: Dually Distilling Knowledge Graph Embedding for Faster and Cheaper Reasoning
Yushan Zhu
Wen Zhang
Mingyang Chen
Hui Chen
Xu-Xin Cheng
Wei Zhang
Huajun Chen Zhejiang University
6
27
0
13 Sep 2020
Compressed Deep Networks: Goodbye SVD, Hello Robust Low-Rank
  Approximation
Compressed Deep Networks: Goodbye SVD, Hello Robust Low-Rank Approximation
M. Tukan
Alaa Maalouf
Matan Weksler
Dan Feldman
8
9
0
11 Sep 2020
Accelerating Real-Time Question Answering via Question Generation
Accelerating Real-Time Question Answering via Question Generation
Yuwei Fang
Shuohang Wang
Zhe Gan
S. Sun
Jingjing Liu
Chenguang Zhu
OnRL
10
16
0
10 Sep 2020
Compression of Deep Learning Models for Text: A Survey
Compression of Deep Learning Models for Text: A Survey
Manish Gupta
Puneet Agrawal
VLM
MedIm
AI4CE
4
115
0
12 Aug 2020
Understanding BERT Rankers Under Distillation
Understanding BERT Rankers Under Distillation
Luyu Gao
Zhuyun Dai
Jamie Callan
14
49
0
21 Jul 2020
Knowledge Distillation in Deep Learning and its Applications
Knowledge Distillation in Deep Learning and its Applications
Abdolmaged Alkhulaifi
Fahad Alsahli
Irfan Ahmad
FedML
12
75
0
17 Jul 2020
Extracurricular Learning: Knowledge Transfer Beyond Empirical
  Distribution
Extracurricular Learning: Knowledge Transfer Beyond Empirical Distribution
Hadi Pouransari
Mojan Javaheripi
Vinay Sharma
Oncel Tuzel
6
5
0
30 Jun 2020
SqueezeBERT: What can computer vision teach NLP about efficient neural
  networks?
SqueezeBERT: What can computer vision teach NLP about efficient neural networks?
F. Iandola
Albert Eaton Shaw
Ravi Krishna
Kurt Keutzer
VLM
20
127
0
19 Jun 2020
Knowledge Distillation: A Survey
Knowledge Distillation: A Survey
Jianping Gou
B. Yu
Stephen J. Maybank
Dacheng Tao
VLM
19
2,832
0
09 Jun 2020
BERT Loses Patience: Fast and Robust Inference with Early Exit
BERT Loses Patience: Fast and Robust Inference with Early Exit
Wangchunshu Zhou
Canwen Xu
Tao Ge
Julian McAuley
Ke Xu
Furu Wei
6
329
0
07 Jun 2020
An Overview of Neural Network Compression
An Overview of Neural Network Compression
James OÑeill
AI4CE
40
98
0
05 Jun 2020
Transferring Inductive Biases through Knowledge Distillation
Transferring Inductive Biases through Knowledge Distillation
Samira Abnar
Mostafa Dehghani
Willem H. Zuidema
22
57
0
31 May 2020
Distilling Knowledge from Ensembles of Acoustic Models for Joint
  CTC-Attention End-to-End Speech Recognition
Distilling Knowledge from Ensembles of Acoustic Models for Joint CTC-Attention End-to-End Speech Recognition
Yan Gao
Titouan Parcollet
Nicholas D. Lane
VLM
9
13
0
19 May 2020
Distilling Knowledge from Pre-trained Language Models via Text Smoothing
Distilling Knowledge from Pre-trained Language Models via Text Smoothing
Xing Wu
Y. Liu
Xiangyang Zhou
Dianhai Yu
12
6
0
08 May 2020
GOBO: Quantizing Attention-Based NLP Models for Low Latency and Energy
  Efficient Inference
GOBO: Quantizing Attention-Based NLP Models for Low Latency and Energy Efficient Inference
Ali Hadi Zadeh
Isak Edo
Omar Mohamed Awad
Andreas Moshovos
MQ
14
183
0
08 May 2020
HERO: Hierarchical Encoder for Video+Language Omni-representation
  Pre-training
HERO: Hierarchical Encoder for Video+Language Omni-representation Pre-training
Linjie Li
Yen-Chun Chen
Yu Cheng
Zhe Gan
Licheng Yu
Jingjing Liu
MLLM
VLM
OffRL
AI4TS
41
492
0
01 May 2020
Previous
123...1089
Next