ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1908.09355
  4. Cited By
Patient Knowledge Distillation for BERT Model Compression

Patient Knowledge Distillation for BERT Model Compression

25 August 2019
S. Sun
Yu Cheng
Zhe Gan
Jingjing Liu
ArXivPDFHTML

Papers citing "Patient Knowledge Distillation for BERT Model Compression"

50 / 491 papers shown
Title
EasyNLP: A Comprehensive and Easy-to-use Toolkit for Natural Language
  Processing
EasyNLP: A Comprehensive and Easy-to-use Toolkit for Natural Language Processing
Chengyu Wang
Minghui Qiu
Chen Shi
Taolin Zhang
Tingting Liu
Lei Li
J. Wang
Ming Wang
Jun Huang
W. Lin
11
21
0
30 Apr 2022
Attention Mechanism in Neural Networks: Where it Comes and Where it Goes
Attention Mechanism in Neural Networks: Where it Comes and Where it Goes
Derya Soydaner
3DV
36
149
0
27 Apr 2022
Ultra Fast Speech Separation Model with Teacher Student Learning
Ultra Fast Speech Separation Model with Teacher Student Learning
Sanyuan Chen
Yu-Huan Wu
Zhuo Chen
Jian Wu
Takuya Yoshioka
Shujie Liu
Jinyu Li
Xiangzhan Yu
23
14
0
27 Apr 2022
Multimodal Adaptive Distillation for Leveraging Unimodal Encoders for
  Vision-Language Tasks
Multimodal Adaptive Distillation for Leveraging Unimodal Encoders for Vision-Language Tasks
Zhecan Wang
Noel Codella
Yen-Chun Chen
Luowei Zhou
Xiyang Dai
...
Jianwei Yang
Haoxuan You
Kai-Wei Chang
Shih-Fu Chang
Lu Yuan
VLM
OffRL
23
22
0
22 Apr 2022
ALBETO and DistilBETO: Lightweight Spanish Language Models
ALBETO and DistilBETO: Lightweight Spanish Language Models
J. Canete
S. Donoso
Felipe Bravo-Marquez
Andrés Carvallo
Vladimir Araujo
37
20
0
19 Apr 2022
MoEBERT: from BERT to Mixture-of-Experts via Importance-Guided
  Adaptation
MoEBERT: from BERT to Mixture-of-Experts via Importance-Guided Adaptation
Simiao Zuo
Qingru Zhang
Chen Liang
Pengcheng He
T. Zhao
Weizhu Chen
MoE
22
38
0
15 Apr 2022
CILDA: Contrastive Data Augmentation using Intermediate Layer Knowledge
  Distillation
CILDA: Contrastive Data Augmentation using Intermediate Layer Knowledge Distillation
Md. Akmal Haidar
Mehdi Rezagholizadeh
Abbas Ghaddar
Khalil Bibi
Philippe Langlais
Pascal Poupart
CLL
22
6
0
15 Apr 2022
Structured Pruning Learns Compact and Accurate Models
Structured Pruning Learns Compact and Accurate Models
Mengzhou Xia
Zexuan Zhong
Danqi Chen
VLM
9
177
0
01 Apr 2022
Feature Structure Distillation with Centered Kernel Alignment in BERT
  Transferring
Feature Structure Distillation with Centered Kernel Alignment in BERT Transferring
Heeseung Jung
Doyeon Kim
Seung-Hoon Na
Kangil Kim
14
5
0
01 Apr 2022
Distill-VQ: Learning Retrieval Oriented Vector Quantization By
  Distilling Knowledge from Dense Embeddings
Distill-VQ: Learning Retrieval Oriented Vector Quantization By Distilling Knowledge from Dense Embeddings
Shitao Xiao
Zheng Liu
Weihao Han
Jianjin Zhang
Defu Lian
...
Hao-Lun Sun
Yingxia Shao
Denvy Deng
Qi Zhang
Xing Xie
23
33
0
01 Apr 2022
Self-Distillation from the Last Mini-Batch for Consistency
  Regularization
Self-Distillation from the Last Mini-Batch for Consistency Regularization
Yiqing Shen
Liwu Xu
Yuzhe Yang
Yaqian Li
Yandong Guo
15
60
0
30 Mar 2022
Graph Neural Networks in IoT: A Survey
Graph Neural Networks in IoT: A Survey
Guimin Dong
Mingyue Tang
Zhiyuan Wang
Jiechao Gao
Sikun Guo
Lihua Cai
Robert Gutierrez
Brad Campbell
Laura E. Barnes
M. Boukhechba
GNN
AI4CE
31
96
0
29 Mar 2022
A Fast Post-Training Pruning Framework for Transformers
A Fast Post-Training Pruning Framework for Transformers
Woosuk Kwon
Sehoon Kim
Michael W. Mahoney
Joseph Hassoun
Kurt Keutzer
A. Gholami
15
143
0
29 Mar 2022
Knowledge Distillation: Bad Models Can Be Good Role Models
Knowledge Distillation: Bad Models Can Be Good Role Models
Gal Kaplun
Eran Malach
Preetum Nakkiran
Shai Shalev-Shwartz
FedML
15
15
0
28 Mar 2022
Pyramid-BERT: Reducing Complexity via Successive Core-set based Token
  Selection
Pyramid-BERT: Reducing Complexity via Successive Core-set based Token Selection
Xin Huang
A. Khetan
Rene Bidart
Zohar S. Karnin
17
14
0
27 Mar 2022
Delta Keyword Transformer: Bringing Transformers to the Edge through
  Dynamically Pruned Multi-Head Self-Attention
Delta Keyword Transformer: Bringing Transformers to the Edge through Dynamically Pruned Multi-Head Self-Attention
Zuzana Jelčicová
Marian Verhelst
26
5
0
20 Mar 2022
Knowledge Amalgamation for Object Detection with Transformers
Knowledge Amalgamation for Object Detection with Transformers
Haofei Zhang
Feng Mao
Mengqi Xue
Gongfan Fang
Zunlei Feng
Jie Song
Mingli Song
ViT
108
12
0
07 Mar 2022
A Simple Hash-Based Early Exiting Approach For Language Understanding
  and Generation
A Simple Hash-Based Early Exiting Approach For Language Understanding and Generation
Tianxiang Sun
Xiangyang Liu
Wei-wei Zhu
Zhichao Geng
Lingling Wu
Yilong He
Yuan Ni
Guotong Xie
Xuanjing Huang
Xipeng Qiu
23
40
0
03 Mar 2022
TrimBERT: Tailoring BERT for Trade-offs
TrimBERT: Tailoring BERT for Trade-offs
S. N. Sridhar
Anthony Sarah
Sairam Sundaresan
MQ
19
4
0
24 Feb 2022
A Survey on Model Compression and Acceleration for Pretrained Language
  Models
A Survey on Model Compression and Acceleration for Pretrained Language Models
Canwen Xu
Julian McAuley
23
58
0
15 Feb 2022
Exploring Inter-Channel Correlation for Diversity-preserved
  KnowledgeDistillation
Exploring Inter-Channel Correlation for Diversity-preserved KnowledgeDistillation
Li Liu
Qingle Huang
Sihao Lin
Hongwei Xie
Bing Wang
Xiaojun Chang
Xiao-Xue Liang
28
100
0
08 Feb 2022
Improving Robustness by Enhancing Weak Subnets
Improving Robustness by Enhancing Weak Subnets
Yong Guo
David Stutz
Bernt Schiele
AAML
14
15
0
30 Jan 2022
AutoDistil: Few-shot Task-agnostic Neural Architecture Search for
  Distilling Large Language Models
AutoDistil: Few-shot Task-agnostic Neural Architecture Search for Distilling Large Language Models
Dongkuan Xu
Subhabrata Mukherjee
Xiaodong Liu
Debadeepta Dey
Wenhui Wang
Xiang Zhang
Ahmed Hassan Awadallah
Jianfeng Gao
25
4
0
29 Jan 2022
AutoDistill: an End-to-End Framework to Explore and Distill
  Hardware-Efficient Language Models
AutoDistill: an End-to-End Framework to Explore and Distill Hardware-Efficient Language Models
Xiaofan Zhang
Zongwei Zhou
Deming Chen
Yu Emma Wang
20
11
0
21 Jan 2022
VAQF: Fully Automatic Software-Hardware Co-Design Framework for Low-Bit
  Vision Transformer
VAQF: Fully Automatic Software-Hardware Co-Design Framework for Low-Bit Vision Transformer
Mengshu Sun
Haoyu Ma
Guoliang Kang
Yifan Jiang
Tianlong Chen
Xiaolong Ma
Zhangyang Wang
Yanzhi Wang
ViT
25
45
0
17 Jan 2022
Ensemble Transformer for Efficient and Accurate Ranking Tasks: an
  Application to Question Answering Systems
Ensemble Transformer for Efficient and Accurate Ranking Tasks: an Application to Question Answering Systems
Yoshitomo Matsubara
Luca Soldaini
Eric Lind
Alessandro Moschitti
21
6
0
15 Jan 2022
CLIP-TD: CLIP Targeted Distillation for Vision-Language Tasks
CLIP-TD: CLIP Targeted Distillation for Vision-Language Tasks
Zhecan Wang
Noel Codella
Yen-Chun Chen
Luowei Zhou
Jianwei Yang
Xiyang Dai
Bin Xiao
Haoxuan You
Shih-Fu Chang
Lu Yuan
CLIP
VLM
22
39
0
15 Jan 2022
DeepSpeed-MoE: Advancing Mixture-of-Experts Inference and Training to
  Power Next-Generation AI Scale
DeepSpeed-MoE: Advancing Mixture-of-Experts Inference and Training to Power Next-Generation AI Scale
Samyam Rajbhandari
Conglong Li
Z. Yao
Minjia Zhang
Reza Yazdani Aminabadi
A. A. Awan
Jeff Rasley
Yuxiong He
30
283
0
14 Jan 2022
GateFormer: Speeding Up News Feed Recommendation with Input Gated
  Transformers
GateFormer: Speeding Up News Feed Recommendation with Input Gated Transformers
Peitian Zhang
Zheng liu
AI4TS
22
1
0
12 Jan 2022
Conditional Generative Data-free Knowledge Distillation
Conditional Generative Data-free Knowledge Distillation
Xinyi Yu
Ling Yan
Yang Yang
Libo Zhou
Linlin Ou
18
8
0
31 Dec 2021
Data-Free Knowledge Transfer: A Survey
Data-Free Knowledge Transfer: A Survey
Yuang Liu
Wei Zhang
Jun Wang
Jianyong Wang
27
48
0
31 Dec 2021
Automatic Mixed-Precision Quantization Search of BERT
Automatic Mixed-Precision Quantization Search of BERT
Changsheng Zhao
Ting Hua
Yilin Shen
Qian Lou
Hongxia Jin
MQ
17
19
0
30 Dec 2021
ERNIE 3.0 Titan: Exploring Larger-scale Knowledge Enhanced Pre-training
  for Language Understanding and Generation
ERNIE 3.0 Titan: Exploring Larger-scale Knowledge Enhanced Pre-training for Language Understanding and Generation
Shuohuan Wang
Yu Sun
Yang Xiang
Zhihua Wu
Siyu Ding
...
Tian Wu
Wei Zeng
Ge Li
Wen Gao
Haifeng Wang
ELM
31
78
0
23 Dec 2021
Distilled Dual-Encoder Model for Vision-Language Understanding
Distilled Dual-Encoder Model for Vision-Language Understanding
Zekun Wang
Wenhui Wang
Haichao Zhu
Ming Liu
Bing Qin
Furu Wei
VLM
FedML
21
30
0
16 Dec 2021
Model Uncertainty-Aware Knowledge Amalgamation for Pre-Trained Language
  Models
Model Uncertainty-Aware Knowledge Amalgamation for Pre-Trained Language Models
Lei Li
Yankai Lin
Xuancheng Ren
Guangxiang Zhao
Peng Li
Jie Zhou
Xu Sun
MoMe
22
2
0
14 Dec 2021
From Dense to Sparse: Contrastive Pruning for Better Pre-trained
  Language Model Compression
From Dense to Sparse: Contrastive Pruning for Better Pre-trained Language Model Compression
Runxin Xu
Fuli Luo
Chengyu Wang
Baobao Chang
Jun Huang
Songfang Huang
Fei Huang
VLM
27
25
0
14 Dec 2021
On the Compression of Natural Language Models
On the Compression of Natural Language Models
S. Damadi
14
0
0
13 Dec 2021
DistilCSE: Effective Knowledge Distillation For Contrastive Sentence
  Embeddings
DistilCSE: Effective Knowledge Distillation For Contrastive Sentence Embeddings
Chaochen Gao
Xing Wu
Peng Wang
Jue Wang
Liangjun Zang
Zhongyuan Wang
Songlin Hu
12
3
0
10 Dec 2021
VIRT: Improving Representation-based Models for Text Matching through
  Virtual Interaction
VIRT: Improving Representation-based Models for Text Matching through Virtual Interaction
Dan Li
Yang Yang
Hongyin Tang
Jingang Wang
Tong Bill Xu
Wei Yu Wu
Enhong Chen
20
7
0
08 Dec 2021
Causal Distillation for Language Models
Causal Distillation for Language Models
Zhengxuan Wu
Atticus Geiger
J. Rozner
Elisa Kreiss
Hanson Lu
Thomas F. Icard
Christopher Potts
Noah D. Goodman
81
25
0
05 Dec 2021
WiFi-based Multi-task Sensing
WiFi-based Multi-task Sensing
Xie Zhang
Chengpei Tang
Yasong An
Kang Yin
11
1
0
26 Nov 2021
Hierarchical Knowledge Distillation for Dialogue Sequence Labeling
Hierarchical Knowledge Distillation for Dialogue Sequence Labeling
Shota Orihashi
Yoshihiro Yamazaki
Naoki Makishima
Mana Ihori
Akihiko Takashima
Tomohiro Tanaka
Ryo Masumura
17
0
0
22 Nov 2021
A Survey on Green Deep Learning
A Survey on Green Deep Learning
Jingjing Xu
Wangchunshu Zhou
Zhiyi Fu
Hao Zhou
Lei Li
VLM
73
83
0
08 Nov 2021
Leveraging Advantages of Interactive and Non-Interactive Models for
  Vector-Based Cross-Lingual Information Retrieval
Leveraging Advantages of Interactive and Non-Interactive Models for Vector-Based Cross-Lingual Information Retrieval
Linlong Xu
Baosong Yang
Xiaoyu Lv
Tianchi Bi
Dayiheng Liu
Haibo Zhang
26
6
0
03 Nov 2021
Magic Pyramid: Accelerating Inference with Early Exiting and Token
  Pruning
Magic Pyramid: Accelerating Inference with Early Exiting and Token Pruning
Xuanli He
I. Keivanloo
Yi Xu
Xiang He
Belinda Zeng
Santosh Rajagopalan
Trishul M. Chilimbi
10
18
0
30 Oct 2021
NxMTransformer: Semi-Structured Sparsification for Natural Language
  Understanding via ADMM
NxMTransformer: Semi-Structured Sparsification for Natural Language Understanding via ADMM
Connor Holmes
Minjia Zhang
Yuxiong He
Bo Wu
29
18
0
28 Oct 2021
Mosaicking to Distill: Knowledge Distillation from Out-of-Domain Data
Mosaicking to Distill: Knowledge Distillation from Out-of-Domain Data
Gongfan Fang
Yifan Bao
Jie Song
Xinchao Wang
Don Xie
Chengchao Shen
Mingli Song
29
44
0
27 Oct 2021
How and When Adversarial Robustness Transfers in Knowledge Distillation?
How and When Adversarial Robustness Transfers in Knowledge Distillation?
Rulin Shao
Ming Zhou
C. Bezemer
Cho-Jui Hsieh
AAML
11
17
0
22 Oct 2021
Ensemble ALBERT on SQuAD 2.0
Ensemble ALBERT on SQuAD 2.0
Shilun Li
Renee Li
Veronica Peng
MoE
6
6
0
19 Oct 2021
Improving Transformers with Probabilistic Attention Keys
Improving Transformers with Probabilistic Attention Keys
Tam Nguyen
T. Nguyen
Dung D. Le
Duy Khuong Nguyen
Viet-Anh Tran
Richard G. Baraniuk
Nhat Ho
Stanley J. Osher
45
32
0
16 Oct 2021
Previous
123...1056789
Next