ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1909.10351
  4. Cited By
TinyBERT: Distilling BERT for Natural Language Understanding
v1v2v3v4v5 (latest)

TinyBERT: Distilling BERT for Natural Language Understanding

Findings (Findings), 2019
23 September 2019
Xiaoqi Jiao
Yichun Yin
Lifeng Shang
Xin Jiang
Xiao Chen
Linlin Li
F. Wang
Qun Liu
    VLM
ArXiv (abs)PDFHTML

Papers citing "TinyBERT: Distilling BERT for Natural Language Understanding"

50 / 1,055 papers shown
PLaD: Preference-based Large Language Model Distillation with
  Pseudo-Preference Pairs
PLaD: Preference-based Large Language Model Distillation with Pseudo-Preference Pairs
Rongzhi Zhang
Jiaming Shen
Tianqi Liu
Haorui Wang
Zhen Qin
Feng Han
Jialu Liu
Simon Baumgartner
Michael Bendersky
Chao Zhang
199
13
0
05 Jun 2024
Seeing the Forest through the Trees: Data Leakage from Partial
  Transformer Gradients
Seeing the Forest through the Trees: Data Leakage from Partial Transformer Gradients
Weijun Li
Xingliang Yuan
Mark Dras
PILM
266
4
0
03 Jun 2024
Posterior Label Smoothing for Node Classification
Posterior Label Smoothing for Node Classification
Jaeseung Heo
M. Park
Dongwoo Kim
UQCV
577
0
0
01 Jun 2024
STAT: Shrinking Transformers After Training
STAT: Shrinking Transformers After Training
Megan Flynn
Alexander Wang
Dean Edward Alvarez
Christopher De Sa
Anil Damle
311
3
0
29 May 2024
FinerCut: Finer-grained Interpretable Layer Pruning for Large Language
  Models
FinerCut: Finer-grained Interpretable Layer Pruning for Large Language Models
Yang Zhang
Yawei Li
Xinpeng Wang
Qianli Shen
Barbara Plank
Bernd Bischl
Mina Rezaei
Kenji Kawaguchi
249
20
0
28 May 2024
Exploring Ordinality in Text Classification: A Comparative Study of
  Explicit and Implicit Techniques
Exploring Ordinality in Text Classification: A Comparative Study of Explicit and Implicit Techniques
Siva Rajesh Kasa
Aniket Goel
Karan Gupta
Sumegh Roychowdhury
Anish Bhanushali
Nikhil Pattisapu
Prasanna Srinivasa Murthy
255
6
0
20 May 2024
Efficiency optimization of large-scale language models based on deep
  learning in natural language processing tasks
Efficiency optimization of large-scale language models based on deep learning in natural language processing tasks
Taiyuan Mei
Yun Zi
X. Cheng
Zijun Gao
Qi Wang
Haowei Yang
248
26
0
20 May 2024
Feature-Adaptive and Data-Scalable In-Context Learning
Feature-Adaptive and Data-Scalable In-Context LearningAnnual Meeting of the Association for Computational Linguistics (ACL), 2024
Jiahao Li
Quan Wang
Li Zhang
Guoqing Jin
Zhendong Mao
275
4
0
17 May 2024
A Comprehensive Survey of Accelerated Generation Techniques in Large
  Language Models
A Comprehensive Survey of Accelerated Generation Techniques in Large Language Models
Mahsa Khoshnoodi
Vinija Jain
Mingye Gao
Malavika Srikanth
Vasu Sharma
OffRL
350
9
0
15 May 2024
Exploring Graph-based Knowledge: Multi-Level Feature Distillation via
  Channels Relational Graph
Exploring Graph-based Knowledge: Multi-Level Feature Distillation via Channels Relational Graph
Zhiwei Wang
Jun Huang
Longhua Ma
Chengyu Wu
Hongyu Ma
286
0
0
14 May 2024
ExplainableDetector: Exploring Transformer-based Language Modeling
  Approach for SMS Spam Detection with Explainability Analysis
ExplainableDetector: Exploring Transformer-based Language Modeling Approach for SMS Spam Detection with Explainability Analysis
Mohammad Amaz Uddin
Muhammad Nazrul Islam
Leandros A. Maglaras
Helge Janicke
Iqbal H. Sarker
173
13
0
12 May 2024
A Scene-aware Models Adaptation Scheme for Cross-scene Online Inference on Mobile Devices
A Scene-aware Models Adaptation Scheme for Cross-scene Online Inference on Mobile DevicesIEEE International Conference on Distributed Computing Systems (ICDCS), 2024
Yunzhe Li
Hongzi Zhu
Zhuohong Deng
Yunlong Cheng
Liang Zhang
Shan Chang
Minyi Guo
Minyi Guo
264
4
0
09 May 2024
Is Sora a World Simulator? A Comprehensive Survey on General World Models and Beyond
Is Sora a World Simulator? A Comprehensive Survey on General World Models and Beyond
Zheng Zhu
Xiaofeng Wang
Wangbo Zhao
Chen Min
Nianchen Deng
...
Dawei Zhao
Liang Xiao
Jian-jun Zhao
Jiwen Lu
Guan Huang
VGenLM&Ro
365
84
0
06 May 2024
Structural Pruning of Pre-trained Language Models via Neural
  Architecture Search
Structural Pruning of Pre-trained Language Models via Neural Architecture Search
Aaron Klein
Jacek Golebiowski
Xingchen Ma
Valerio Perrone
Cédric Archambeau
209
5
0
03 May 2024
UniGen: Universal Domain Generalization for Sentiment Classification via
  Zero-shot Dataset Generation
UniGen: Universal Domain Generalization for Sentiment Classification via Zero-shot Dataset Generation
Juhwan Choi
Yeonghwa Kim
Seunguk Yu
Jungmin Yun
Youngbin Kim
216
8
0
02 May 2024
Knowledge Distillation vs. Pretraining from Scratch under a Fixed
  (Computation) Budget
Knowledge Distillation vs. Pretraining from Scratch under a Fixed (Computation) Budget
Minh Duc Bui
Fabian David Schmidt
Goran Glavaš
Katharina von der Wense
189
1
0
30 Apr 2024
EfficientASR: Speech Recognition Network Compression via Attention
  Redundancy and Chunk-Level FFN Optimization
EfficientASR: Speech Recognition Network Compression via Attention Redundancy and Chunk-Level FFN Optimization
Jianzong Wang
Ziqi Liang
Xulong Zhang
Ning Cheng
Jing Xiao
189
1
0
30 Apr 2024
Annotator-Centric Active Learning for Subjective NLP Tasks
Annotator-Centric Active Learning for Subjective NLP Tasks
Michiel van der Meer
Neele Falk
P. Murukannaiah
Enrico Liscio
515
17
0
24 Apr 2024
Parameter Efficient Diverse Paraphrase Generation Using Sequence-Level
  Knowledge Distillation
Parameter Efficient Diverse Paraphrase Generation Using Sequence-Level Knowledge Distillation
Lasal Jayawardena
Prasan Yapa
BDL
266
3
0
19 Apr 2024
An Experimental Study on Exploring Strong Lightweight Vision
  Transformers via Masked Image Modeling Pre-Training
An Experimental Study on Exploring Strong Lightweight Vision Transformers via Masked Image Modeling Pre-Training
Jin Gao
Shubo Lin
Shaoru Wang
Yutong Kou
Zeming Li
Liang Li
Congxuan Zhang
Xiaoqin Zhang
Yizheng Wang
Weiming Hu
291
6
0
18 Apr 2024
ReffAKD: Resource-efficient Autoencoder-based Knowledge Distillation
ReffAKD: Resource-efficient Autoencoder-based Knowledge Distillation
Divyang Doshi
Jung-Eun Kim
207
2
0
15 Apr 2024
MTKD: Multi-Teacher Knowledge Distillation for Image Super-Resolution
MTKD: Multi-Teacher Knowledge Distillation for Image Super-Resolution
Yuxuan Jiang
Chen Feng
Fan Zhang
David Bull
SupR
273
24
0
15 Apr 2024
Navigating the Landscape of Large Language Models: A Comprehensive
  Review and Analysis of Paradigms and Fine-Tuning Strategies
Navigating the Landscape of Large Language Models: A Comprehensive Review and Analysis of Paradigms and Fine-Tuning Strategies
Benjue Weng
LM&MA
287
15
0
13 Apr 2024
Constrained C-Test Generation via Mixed-Integer Programming
Constrained C-Test Generation via Mixed-Integer Programming
Ji-Ung Lee
Marc E. Pfetsch
Iryna Gurevych
181
0
0
12 Apr 2024
CQIL: Inference Latency Optimization with Concurrent Computation of
  Quasi-Independent Layers
CQIL: Inference Latency Optimization with Concurrent Computation of Quasi-Independent LayersAnnual Meeting of the Association for Computational Linguistics (ACL), 2024
Longwei Zou
Qingyang Wang
Han Zhao
Tingfeng Liu
Yi Yang
Yangdong Deng
248
1
0
10 Apr 2024
What Happens When Small Is Made Smaller? Exploring the Impact of
  Compression on Small Data Pretrained Language Models
What Happens When Small Is Made Smaller? Exploring the Impact of Compression on Small Data Pretrained Language Models
Busayo Awobade
Mardiyyah Oduwole
Steven Kolawole
205
1
0
06 Apr 2024
Okay, Let's Do This! Modeling Event Coreference with Generated
  Rationales and Knowledge Distillation
Okay, Let's Do This! Modeling Event Coreference with Generated Rationales and Knowledge DistillationNorth American Chapter of the Association for Computational Linguistics (NAACL), 2024
Abhijnan Nath
Shadi Manafi
Avyakta Chelle
Nikhil Krishnaswamy
236
5
0
04 Apr 2024
Efficiently Distilling LLMs for Edge Applications
Efficiently Distilling LLMs for Edge Applications
Achintya Kundu
Fabian Lim
Aaron Chew
L. Wynter
Penny Chong
Rhui Dih Lee
223
10
0
01 Apr 2024
A Comprehensive Study on NLP Data Augmentation for Hate Speech
  Detection: Legacy Methods, BERT, and LLMs
A Comprehensive Study on NLP Data Augmentation for Hate Speech Detection: Legacy Methods, BERT, and LLMs
Md Saroar Jahan
Mourad Oussalah
D. Beddiar
Jhuma Kabir Mim
Nabil Arhab
189
16
0
30 Mar 2024
Are Compressed Language Models Less Subgroup Robust?
Are Compressed Language Models Less Subgroup Robust?
Leonidas Gee
Andrea Zugarini
Novi Quadrianto
187
2
0
26 Mar 2024
The Unreasonable Ineffectiveness of the Deeper Layers
The Unreasonable Ineffectiveness of the Deeper Layers
Andrey Gromov
Kushal Tirumala
Hassan Shapourian
Paolo Glorioso
Daniel A. Roberts
434
158
0
26 Mar 2024
An Upload-Efficient Scheme for Transferring Knowledge From a Server-Side
  Pre-trained Generator to Clients in Heterogeneous Federated Learning
An Upload-Efficient Scheme for Transferring Knowledge From a Server-Side Pre-trained Generator to Clients in Heterogeneous Federated LearningComputer Vision and Pattern Recognition (CVPR), 2024
Jianqing Zhang
Yang Liu
Yang Hua
Jian Cao
254
20
0
23 Mar 2024
Evaluating Unsupervised Dimensionality Reduction Methods for Pretrained
  Sentence Embeddings
Evaluating Unsupervised Dimensionality Reduction Methods for Pretrained Sentence Embeddings
Gaifan Zhang
Yi Zhou
Danushka Bollegala
221
11
0
20 Mar 2024
Teacher-Student Training for Debiasing: General Permutation Debiasing
  for Large Language Models
Teacher-Student Training for Debiasing: General Permutation Debiasing for Large Language Models
Adian Liusie
Yassir Fathullah
Mark Gales
110
7
0
20 Mar 2024
TriSum: Learning Summarization Ability from Large Language Models with
  Structured Rationale
TriSum: Learning Summarization Ability from Large Language Models with Structured RationaleNorth American Chapter of the Association for Computational Linguistics (NAACL), 2024
Pengcheng Jiang
Cao Xiao
Zifeng Wang
Parminder Bhatia
Jimeng Sun
Jiawei Han
LRM
232
16
0
15 Mar 2024
FBPT: A Fully Binary Point Transformer
FBPT: A Fully Binary Point TransformerIEEE International Conference on Robotics and Automation (ICRA), 2024
Zhixing Hou
Yuzhang Shang
Yan Yan
MQ
233
1
0
15 Mar 2024
Measuring Bias in a Ranked List using Term-based Representations
Measuring Bias in a Ranked List using Term-based RepresentationsEuropean Conference on Information Retrieval (ECIR), 2024
Amin Abolghasemi
Leif Azzopardi
Arian Askari
Maarten de Rijke
Suzan Verberne
195
9
0
09 Mar 2024
Learning to Maximize Mutual Information for Chain-of-Thought
  Distillation
Learning to Maximize Mutual Information for Chain-of-Thought Distillation
Xin Chen
Hanxian Huang
Yanjun Gao
Yi Wang
Jishen Zhao
Ke Ding
361
27
0
05 Mar 2024
Improving the Downstream Performance of Mixture-of-Experts Transformers via Weak Vanilla Transformers
Improving the Downstream Performance of Mixture-of-Experts Transformers via Weak Vanilla Transformers
Xin Lu
Yanyan Zhao
Bing Qin
Ting Liu
MoE
118
1
0
04 Mar 2024
Align-to-Distill: Trainable Attention Alignment for Knowledge
  Distillation in Neural Machine Translation
Align-to-Distill: Trainable Attention Alignment for Knowledge Distillation in Neural Machine Translation
Heegon Jin
Seonil Son
Jemin Park
Youngseok Kim
Hyungjong Noh
Yeonsoo Lee
336
4
0
03 Mar 2024
Differentially Private Knowledge Distillation via Synthetic Text Generation
Differentially Private Knowledge Distillation via Synthetic Text Generation
James Flemings
Murali Annavaram
SyDa
423
18
0
01 Mar 2024
Sinkhorn Distance Minimization for Knowledge Distillation
Sinkhorn Distance Minimization for Knowledge Distillation
Xiao Cui
Yulei Qin
Yuting Gao
Enwei Zhang
Zihan Xu
Tong Wu
Ke Li
Xing Sun
Wen-gang Zhou
Houqiang Li
214
19
0
27 Feb 2024
Layer-wise Regularized Dropout for Neural Language Models
Layer-wise Regularized Dropout for Neural Language Models
Shiwen Ni
Min Yang
Ruifeng Xu
Chengming Li
Xiping Hu
126
0
0
26 Feb 2024
Knowledge Fusion of Chat LLMs: A Preliminary Technical Report
Knowledge Fusion of Chat LLMs: A Preliminary Technical Report
Fanqi Wan
Ziyi Yang
Longguang Zhong
Xiaojun Quan
Xinting Huang
Wei Bi
MoMe
518
2
0
25 Feb 2024
$C^3$: Confidence Calibration Model Cascade for Inference-Efficient
  Cross-Lingual Natural Language Understanding
C3C^3C3: Confidence Calibration Model Cascade for Inference-Efficient Cross-Lingual Natural Language Understanding
Taixi Lu
Haoyu Wang
Huajie Shao
Jing Gao
Huaxiu Yao
170
0
0
25 Feb 2024
Divide-or-Conquer? Which Part Should You Distill Your LLM?
Divide-or-Conquer? Which Part Should You Distill Your LLM?
Zhuofeng Wu
Richard He Bai
Aonan Zhang
Jiatao Gu
V. Vydiswaran
Navdeep Jaitly
Yizhe Zhang
LRM
316
22
0
22 Feb 2024
Ouroboros: Generating Longer Drafts Phrase by Phrase for Faster
  Speculative Decoding
Ouroboros: Generating Longer Drafts Phrase by Phrase for Faster Speculative Decoding
Weilin Zhao
Yuxiang Huang
Xu Han
Wang Xu
Chaojun Xiao
Xinrong Zhang
Yewei Fang
Kaihuo Zhang
Zhiyuan Liu
Maosong Sun
280
23
0
21 Feb 2024
EffLoc: Lightweight Vision Transformer for Efficient 6-DOF Camera
  Relocalization
EffLoc: Lightweight Vision Transformer for Efficient 6-DOF Camera Relocalization
Zhendong Xiao
Changhao Chen
Shan Yang
Wu Wei
196
4
0
21 Feb 2024
An Explainable Transformer-based Model for Phishing Email Detection: A Large Language Model Approach
An Explainable Transformer-based Model for Phishing Email Detection: A Large Language Model Approach
Mohammad Amaz Uddin
Md Mahiuddin
Iqbal H. Sarker
208
41
0
21 Feb 2024
A Survey on Knowledge Distillation of Large Language Models
A Survey on Knowledge Distillation of Large Language Models
Xiaohan Xu
Ming Li
Chongyang Tao
Tao Shen
Reynold Cheng
Jinyang Li
Can Xu
Dacheng Tao
Wanrong Zhu
KELMVLM
469
238
0
20 Feb 2024
Previous
123456...202122
Next
Page 5 of 22
Pageof 22