Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1903.12136
Cited By
Distilling Task-Specific Knowledge from BERT into Simple Neural Networks
28 March 2019
Raphael Tang
Yao Lu
Linqing Liu
Lili Mou
Olga Vechtomova
Jimmy J. Lin
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Distilling Task-Specific Knowledge from BERT into Simple Neural Networks"
50 / 50 papers shown
Title
Mitigating Catastrophic Forgetting in the Incremental Learning of Medical Images
Sara Yavari
Jacob Furst
CLL
53
0
0
28 Apr 2025
Joint Fine-tuning and Conversion of Pretrained Speech and Language Models towards Linear Complexity
Mutian He
Philip N. Garner
80
0
0
09 Oct 2024
Can't Hide Behind the API: Stealing Black-Box Commercial Embedding Models
Manveer Singh Tamber
Jasper Xian
Jimmy Lin
MLAU
SILM
134
0
0
13 Jun 2024
Augmenting Offline RL with Unlabeled Data
Zhao Wang
Briti Gangopadhyay
Jia-Fong Yeh
Shingo Takamatsu
OffRL
26
0
0
11 Jun 2024
Integrating Domain Knowledge for handling Limited Data in Offline RL
Briti Gangopadhyay
Zhao Wang
Jia-Fong Yeh
Shingo Takamatsu
OffRL
32
0
0
11 Jun 2024
Sentence-Level or Token-Level? A Comprehensive Study on Knowledge Distillation
Jingxuan Wei
Linzhuang Sun
Yichong Leng
Xu Tan
Bihui Yu
Ruifeng Guo
43
3
0
23 Apr 2024
Teaching MLP More Graph Information: A Three-stage Multitask Knowledge Distillation Framework
Junxian Li
Bin Shi
Erfei Cui
Hua Wei
Qinghua Zheng
41
0
0
02 Mar 2024
Confidence Preservation Property in Knowledge Distillation Abstractions
Dmitry Vengertsev
Elena Sherman
24
0
0
21 Jan 2024
Mixed Distillation Helps Smaller Language Model Better Reasoning
Chenglin Li
Qianglong Chen
Liangyue Li
Wang Caiyu
Yicheng Li
Zhang Yin
Yin Zhang
LRM
30
11
0
17 Dec 2023
Teacher-Student Architecture for Knowledge Distillation: A Survey
Chengming Hu
Xuan Li
Danyang Liu
Haolun Wu
Xi Chen
Ju Wang
Xue Liu
21
16
0
08 Aug 2023
Accurate Retraining-free Pruning for Pretrained Encoder-based Language Models
Seungcheol Park
Ho-Jin Choi
U. Kang
VLM
25
5
0
07 Aug 2023
f-Divergence Minimization for Sequence-Level Knowledge Distillation
Yuqiao Wen
Zichao Li
Wenyu Du
Lili Mou
30
53
0
27 Jul 2023
Vesper: A Compact and Effective Pretrained Model for Speech Emotion Recognition
Weidong Chen
Xiaofen Xing
Peihao Chen
Xiangmin Xu
VLM
28
35
0
20 Jul 2023
HomoDistil: Homotopic Task-Agnostic Distillation of Pre-trained Transformers
Chen Liang
Haoming Jiang
Zheng Li
Xianfeng Tang
Bin Yin
Tuo Zhao
VLM
24
24
0
19 Feb 2023
Distillation of encoder-decoder transformers for sequence labelling
M. Farina
D. Pappadopulo
Anant Gupta
Leslie Huang
Ozan Irsoy
Thamar Solorio
VLM
100
3
0
10 Feb 2023
Gradient Knowledge Distillation for Pre-trained Language Models
Lean Wang
Lei Li
Xu Sun
VLM
23
5
0
02 Nov 2022
Teacher-Student Architecture for Knowledge Learning: A Survey
Chengming Hu
Xuan Li
Dan Liu
Xi Chen
Ju Wang
Xue Liu
20
35
0
28 Oct 2022
An Effective, Performant Named Entity Recognition System for Noisy Business Telephone Conversation Transcripts
Xue-Yong Fu
Cheng Chen
Md Tahmid Rahman Laskar
TN ShashiBhushan
Simon Corston-Oliver
30
6
0
27 Sep 2022
Chemical transformer compression for accelerating both training and inference of molecular modeling
Yi Yu
K. Börjesson
19
0
0
16 May 2022
Adaptable Adapters
N. Moosavi
Quentin Delfosse
Kristian Kersting
Iryna Gurevych
48
21
0
03 May 2022
Attention Mechanism with Energy-Friendly Operations
Yu Wan
Baosong Yang
Dayiheng Liu
Rong Xiao
Derek F. Wong
Haibo Zhang
Boxing Chen
Lidia S. Chao
MU
96
1
0
28 Apr 2022
Delta Keyword Transformer: Bringing Transformers to the Edge through Dynamically Pruned Multi-Head Self-Attention
Zuzana Jelčicová
Marian Verhelst
26
5
0
20 Mar 2022
Sparse Distillation: Speeding Up Text Classification by Using Bigger Student Models
Qinyuan Ye
Madian Khabsa
M. Lewis
Sinong Wang
Xiang Ren
Aaron Jaech
29
5
0
16 Oct 2021
Low-Latency Incremental Text-to-Speech Synthesis with Distilled Context Prediction Network
Takaaki Saeki
Shinnosuke Takamichi
Hiroshi Saruwatari
18
3
0
22 Sep 2021
Student Surpasses Teacher: Imitation Attack for Black-Box NLP APIs
Qiongkai Xu
Xuanli He
Lingjuan Lyu
Lizhen Qu
Gholamreza Haffari
MLAU
30
21
0
29 Aug 2021
Knowledge Distillation for Quality Estimation
Amit Gajbhiye
M. Fomicheva
Fernando Alva-Manchego
Frédéric Blain
A. Obamuyide
Nikolaos Aletras
Lucia Specia
14
11
0
01 Jul 2021
XtremeDistilTransformers: Task Transfer for Task-agnostic Distillation
Subhabrata Mukherjee
Ahmed Hassan Awadallah
Jianfeng Gao
17
22
0
08 Jun 2021
The NLP Cookbook: Modern Recipes for Transformer based Deep Learning Architectures
Sushant Singh
A. Mahmood
AI4TS
55
92
0
23 Mar 2021
Learning to Augment for Data-Scarce Domain BERT Knowledge Distillation
Lingyun Feng
Minghui Qiu
Yaliang Li
Haitao Zheng
Ying Shen
38
10
0
20 Jan 2021
LRC-BERT: Latent-representation Contrastive Knowledge Distillation for Natural Language Understanding
Hao Fu
Shaojun Zhou
Qihong Yang
Junjie Tang
Guiquan Liu
Kaikui Liu
Xiaolong Li
29
57
0
14 Dec 2020
Reinforced Multi-Teacher Selection for Knowledge Distillation
Fei Yuan
Linjun Shou
J. Pei
Wutao Lin
Ming Gong
Yan Fu
Daxin Jiang
8
121
0
11 Dec 2020
BERT2DNN: BERT Distillation with Massive Unlabeled Data for Online E-Commerce Search
Yunjiang Jiang
Yue Shang
Ziyang Liu
Hongwei Shen
Yun Xiao
Wei Xiong
Sulong Xu
Weipeng P. Yan
Di Jin
29
17
0
20 Oct 2020
Pretrained Transformers for Text Ranking: BERT and Beyond
Jimmy J. Lin
Rodrigo Nogueira
Andrew Yates
VLM
219
608
0
13 Oct 2020
Structural Knowledge Distillation: Tractably Distilling Information for Structured Predictor
Xinyu Wang
Yong-jia Jiang
Zhaohui Yan
Zixia Jia
Nguyen Bach
Tao Wang
Zhongqiang Huang
Fei Huang
Kewei Tu
26
10
0
10 Oct 2020
Efficient Transformers: A Survey
Yi Tay
Mostafa Dehghani
Dara Bahri
Donald Metzler
VLM
74
1,101
0
14 Sep 2020
DualDE: Dually Distilling Knowledge Graph Embedding for Faster and Cheaper Reasoning
Yushan Zhu
Wen Zhang
Mingyang Chen
Hui Chen
Xu-Xin Cheng
Wei Zhang
Huajun Chen Zhejiang University
8
27
0
13 Sep 2020
Students Need More Attention: BERT-based AttentionModel for Small Data with Application to AutomaticPatient Message Triage
Shijing Si
Rui Wang
Jedrek Wosik
Hao Zhang
D. Dov
Guoyin Wang
Ricardo Henao
Lawrence Carin
12
24
0
22 Jun 2020
Knowledge Distillation: A Survey
Jianping Gou
B. Yu
Stephen J. Maybank
Dacheng Tao
VLM
19
2,835
0
09 Jun 2020
Movement Pruning: Adaptive Sparsity by Fine-Tuning
Victor Sanh
Thomas Wolf
Alexander M. Rush
13
466
0
15 May 2020
Detecting Adverse Drug Reactions from Twitter through Domain-Specific Preprocessing and BERT Ensembling
Amy Breden
L. Moore
15
13
0
11 May 2020
GOBO: Quantizing Attention-Based NLP Models for Low Latency and Energy Efficient Inference
Ali Hadi Zadeh
Isak Edo
Omar Mohamed Awad
Andreas Moshovos
MQ
19
183
0
08 May 2020
The Right Tool for the Job: Matching Model and Instance Complexities
Roy Schwartz
Gabriel Stanovsky
Swabha Swayamdipta
Jesse Dodge
Noah A. Smith
33
167
0
16 Apr 2020
Squeezed Deep 6DoF Object Detection Using Knowledge Distillation
H. Felix
Walber M. Rodrigues
David Macêdo
Francisco Simões
Adriano Oliveira
Veronica Teichrieb
Cleber Zanchettin
3DPC
14
9
0
30 Mar 2020
Pre-trained Models for Natural Language Processing: A Survey
Xipeng Qiu
Tianxiang Sun
Yige Xu
Yunfan Shao
Ning Dai
Xuanjing Huang
LM&MA
VLM
243
1,450
0
18 Mar 2020
MiniLM: Deep Self-Attention Distillation for Task-Agnostic Compression of Pre-Trained Transformers
Wenhui Wang
Furu Wei
Li Dong
Hangbo Bao
Nan Yang
Ming Zhou
VLM
45
1,198
0
25 Feb 2020
Pre-training Tasks for Embedding-based Large-scale Retrieval
Wei-Cheng Chang
Felix X. Yu
Yin-Wen Chang
Yiming Yang
Sanjiv Kumar
RALM
11
301
0
10 Feb 2020
Reducing Transformer Depth on Demand with Structured Dropout
Angela Fan
Edouard Grave
Armand Joulin
19
584
0
25 Sep 2019
DocBERT: BERT for Document Classification
Ashutosh Adhikari
Achyudh Ram
Raphael Tang
Jimmy J. Lin
LLMAG
VLM
11
296
0
17 Apr 2019
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding
Alex Jinpeng Wang
Amanpreet Singh
Julian Michael
Felix Hill
Omer Levy
Samuel R. Bowman
ELM
297
6,950
0
20 Apr 2018
Convolutional Neural Networks for Sentence Classification
Yoon Kim
AILaw
VLM
250
13,364
0
25 Aug 2014
1