ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1909.10351
  4. Cited By
TinyBERT: Distilling BERT for Natural Language Understanding
v1v2v3v4v5 (latest)

TinyBERT: Distilling BERT for Natural Language Understanding

Findings (Findings), 2019
23 September 2019
Xiaoqi Jiao
Yichun Yin
Lifeng Shang
Xin Jiang
Xiao Chen
Linlin Li
F. Wang
Qun Liu
    VLM
ArXiv (abs)PDFHTML

Papers citing "TinyBERT: Distilling BERT for Natural Language Understanding"

50 / 1,055 papers shown
An Empirical Study of Knowledge Distillation for Code Understanding Tasks
An Empirical Study of Knowledge Distillation for Code Understanding Tasks
Ruiqi Wang
Zezhou Yang
Cuiyun Gao
Xin Xia
Qing Liao
124
1
0
21 Aug 2025
Checkmate: interpretable and explainable RSVQA is the endgame
Checkmate: interpretable and explainable RSVQA is the endgame
Lucrezia Tosato
Christel Chappuis
Syrielle Montariol
F. Weissgerber
Sylvain Lobry
D. Tuia
136
0
0
18 Aug 2025
Computational Economics in Large Language Models: Exploring Model Behavior and Incentive Design under Resource Constraints
Computational Economics in Large Language Models: Exploring Model Behavior and Incentive Design under Resource Constraints
Sandeep Reddy
Kabir Khan
Rohit Patil
Ananya Chakraborty
Faizan A. Khan
Swati Kulkarni
Arjun Verma
Neha Singh
160
1
0
14 Aug 2025
Personalized Product Search Ranking: A Multi-Task Learning Approach with Tabular and Non-Tabular Data
Personalized Product Search Ranking: A Multi-Task Learning Approach with Tabular and Non-Tabular Data
Lalitesh Morishetti
Abhay Kumar
Jonathan Scott
Gabriele Tolomei
Gunjan Sharma
Shanu Vashishtha
Rahul Sridhar
Rohit Chatter
Kannan Achan
120
0
0
13 Aug 2025
Bhav-Net: Knowledge Transfer for Cross-Lingual Antonym vs Synonym Distinction via Dual-Space Graph Transformers
Bhav-Net: Knowledge Transfer for Cross-Lingual Antonym vs Synonym Distinction via Dual-Space Graph Transformers
Samyak S. Sanghvi
162
0
0
12 Aug 2025
GeRe: Towards Efficient Anti-Forgetting in Continual Learning of LLM via General Samples Replay
GeRe: Towards Efficient Anti-Forgetting in Continual Learning of LLM via General Samples Replay
Yunan Zhang
Shuoran Jiang
Mengchen Zhao
Yuefeng Li
Yang Fan
Xiangping Wu
Qingcai Chen
KELMCLL
135
1
0
06 Aug 2025
CARD: A Cache-Assisted Parallel Speculative Decoding Framework via Query-and-Correct Paradigm for Accelerating LLM Inference
CARD: A Cache-Assisted Parallel Speculative Decoding Framework via Query-and-Correct Paradigm for Accelerating LLM Inference
Enyu Zhou
Kai Sheng
Hao Chen
Xin He
LRM
156
0
0
06 Aug 2025
DACTYL: Diverse Adversarial Corpus of Texts Yielded from Large Language Models
DACTYL: Diverse Adversarial Corpus of Texts Yielded from Large Language Models
Shantanu Thorat
Andrew Caines
157
0
0
01 Aug 2025
Enhanced Arabic Text Retrieval with Attentive Relevance Scoring
Enhanced Arabic Text Retrieval with Attentive Relevance Scoring
Salah Eddine Bekhouche
Azeddine Benlamoudi
Yazid Bounab
Fadi Dornaika
Abdenour Hadid
138
1
0
31 Jul 2025
On the Sustainability of AI Inferences in the Edge
On the Sustainability of AI Inferences in the Edge
Ghazal Sobhani
Md. Monzurul Amin Ifath
Tushar Sharma
I. Haque
100
0
0
30 Jul 2025
Model-free Speculative Decoding for Transformer-based ASR with Token Map Drafting
Model-free Speculative Decoding for Transformer-based ASR with Token Map Drafting
Tuan Vu Ho
Hiroaki Kokubo
Masaaki Yamamoto
Yohei Kawaguchi
92
0
0
29 Jul 2025
Investigating Structural Pruning and Recovery Techniques for Compressing Multimodal Large Language Models: An Empirical Study
Investigating Structural Pruning and Recovery Techniques for Compressing Multimodal Large Language Models: An Empirical Study
Yiran Huang
Lukas Thede
Goran Frehse
Wenjia Xu
Zeynep Akata
176
0
0
28 Jul 2025
The Carbon Cost of Conversation, Sustainability in the Age of Language Models
The Carbon Cost of Conversation, Sustainability in the Age of Language Models
Sayed Mahbub Hasan Amiri
Prasun Goswami
Md. Mainul Islam
Mohammad Shakhawat Hossen
Sayed Majhab Hasan Amiri
Naznin Akter
SILMSyDa
255
2
0
26 Jul 2025
Basic Reading Distillation
Basic Reading DistillationAnnual Meeting of the Association for Computational Linguistics (ACL), 2025
Zhi Zhou
Sirui Miao
Xiangyu Duan
Hao Yang
M. Zhang
157
0
0
26 Jul 2025
Collaborative Distillation Strategies for Parameter-Efficient Language Model Deployment
Collaborative Distillation Strategies for Parameter-Efficient Language Model Deployment
Xiandong Meng
Yan Wu
Yexin Tian
Xin Hu
Tianze Kang
Junliang Du
140
5
0
21 Jul 2025
Flexible Feature Distillation for Large Language Models
Flexible Feature Distillation for Large Language Models
Khouloud Saadi
Di Wang
261
0
0
14 Jul 2025
Tractable Representation Learning with Probabilistic Circuits
Tractable Representation Learning with Probabilistic Circuits
Steven Braun
Sahil Sidheekh
Antonio Vergari
Martin Mundt
S. Natarajan
Kristian Kersting
TPM
368
0
0
06 Jul 2025
General Compression Framework for Efficient Transformer Object Tracking
General Compression Framework for Efficient Transformer Object Tracking
Lingyi Hong
Jinglun Li
Xinyu Zhou
Shilin Yan
Pinxue Guo
...
Runze Li
Xingdong Sheng
Wei Zhang
Hong Lu
Wenqiang Zhang
ViT
309
2
0
01 Jul 2025
A Hybrid DeBERTa and Gated Broad Learning System for Cyberbullying Detection in English Text
A Hybrid DeBERTa and Gated Broad Learning System for Cyberbullying Detection in English Text
Devesh Kumar
108
0
0
19 Jun 2025
Knowledge Distillation Framework for Accelerating High-Accuracy Neural Network-Based Molecular Dynamics Simulations
Knowledge Distillation Framework for Accelerating High-Accuracy Neural Network-Based Molecular Dynamics Simulations
Naoki Matsumura
Yuta Yoshimoto
Yuto Iwasaki
Meguru Yamazaki
Yasufumi Sakai
230
0
0
18 Jun 2025
AgentDistill: Training-Free Agent Distillation with Generalizable MCP Boxes
AgentDistill: Training-Free Agent Distillation with Generalizable MCP Boxes
Jiahao Qiu
Xinzhe Juan
Yimin Wang
L. Yang
Xuan Qi
...
Hongru Wang
Shilong Liu
Xun Jiang
Liu Leqi
Mengdi Wang
201
9
0
17 Jun 2025
Plug-in and Fine-tuning: Bridging the Gap between Small Language Models and Large Language Models
Plug-in and Fine-tuning: Bridging the Gap between Small Language Models and Large Language ModelsAnnual Meeting of the Association for Computational Linguistics (ACL), 2025
Kyeonghyun Kim
Jinhee Jang
Juhwan Choi
Yoonji Lee
Kyohoon Jin
Youngbin Kim
224
0
0
09 Jun 2025
Analyzing Transformer Models and Knowledge Distillation Approaches for Image Captioning on Edge AI
Analyzing Transformer Models and Knowledge Distillation Approaches for Image Captioning on Edge AI
Wing Man Casca Kwok
Yip Chiu Tung
Kunal Bhagchandani
VLM
146
0
0
04 Jun 2025
On Fairness of Task Arithmetic: The Role of Task Vectors
On Fairness of Task Arithmetic: The Role of Task Vectors
Hiroki Naganuma
Kotaro Yoshida
Laura Gomezjurado Gonzalez
Takafumi Horie
Yuji Naraki
Ryotaro Shimizu
MoMe
192
2
0
30 May 2025
RCCDA: Adaptive Model Updates in the Presence of Concept Drift under a Constrained Resource Budget
RCCDA: Adaptive Model Updates in the Presence of Concept Drift under a Constrained Resource Budget
Adam Piaseczny
Md Kamran Chowdhury Shisher
Shiqiang Wang
Christopher G. Brinton
181
0
0
30 May 2025
FLAT-LLM: Fine-grained Low-rank Activation Space Transformation for Large Language Model Compression
FLAT-LLM: Fine-grained Low-rank Activation Space Transformation for Large Language Model Compression
Jiayi Tian
Ryan Solgi
Jinming Lu
Yifan Yang
Hai Li
Zheng Zhang
191
1
0
29 May 2025
Efficient Large Language Model Inference with Neural Block Linearization
Efficient Large Language Model Inference with Neural Block Linearization
Mete Erdogan
F. Tonin
Volkan Cevher
363
1
0
27 May 2025
SEMFED: Semantic-Aware Resource-Efficient Federated Learning for Heterogeneous NLP Tasks
SEMFED: Semantic-Aware Resource-Efficient Federated Learning for Heterogeneous NLP Tasks
Sajid Hussain
Muhammad Sohail
Nauman Ali Khan
FedML
149
1
0
26 May 2025
Avoid Forgetting by Preserving Global Knowledge Gradients in Federated Learning with Non-IID Data
Avoid Forgetting by Preserving Global Knowledge Gradients in Federated Learning with Non-IID Data
Abhijit Chunduru
Majid Morafah
Mahdi Morafah
Vishnu Pandi Chellapandi
Ang Li
FedML
361
0
0
26 May 2025
Small Language Models: Architectures, Techniques, Evaluation, Problems and Future Adaptation
Small Language Models: Architectures, Techniques, Evaluation, Problems and Future Adaptation
Tanjil Hasan Sakib
Md. Tanzib Hosain
Md. Kishor Morol
ALM
258
0
0
26 May 2025
FAR: Function-preserving Attention Replacement for IMC-friendly Inference
FAR: Function-preserving Attention Replacement for IMC-friendly Inference
Yuxin Ren
Maxwell D Collins
Miao Hu
Huanrui Yang
228
0
0
24 May 2025
On the creation of narrow AI: hierarchy and nonlocality of neural network skills
On the creation of narrow AI: hierarchy and nonlocality of neural network skills
Eric J. Michaud
Asher Parker-Sartori
Max Tegmark
449
2
0
21 May 2025
Bridging Generative and Discriminative Learning: Few-Shot Relation Extraction via Two-Stage Knowledge-Guided Pre-training
Bridging Generative and Discriminative Learning: Few-Shot Relation Extraction via Two-Stage Knowledge-Guided Pre-trainingInternational Joint Conference on Artificial Intelligence (IJCAI), 2025
Quanjiang Guo
Jinchuan Zhang
Sijie Wang
Ling Tian
Zhao Kang
Bin Yan
Weidong Xiao
257
5
0
18 May 2025
ExpertSteer: Intervening in LLMs through Expert Knowledge
ExpertSteer: Intervening in LLMs through Expert Knowledge
Weixuan Wang
Minghao Wu
Barry Haddow
Alexandra Birch
LLMSV
491
1
0
18 May 2025
On Membership Inference Attacks in Knowledge Distillation
On Membership Inference Attacks in Knowledge Distillation
Ziyao Cui
Minxing Zhang
Jian Pei
245
2
0
17 May 2025
Distilled Circuits: A Mechanistic Study of Internal Restructuring in Knowledge Distillation
Distilled Circuits: A Mechanistic Study of Internal Restructuring in Knowledge Distillation
Reilly Haskins
Benjamin Adams
289
0
0
16 May 2025
Tracr-Injection: Distilling Algorithms into Pre-trained Language Models
Tracr-Injection: Distilling Algorithms into Pre-trained Language ModelsAnnual Meeting of the Association for Computational Linguistics (ACL), 2025
Tomás Vergara-Browne
Álvaro Soto
544
0
0
15 May 2025
Private Transformer Inference in MLaaS: A Survey
Private Transformer Inference in MLaaS: A Survey
Yang Li
Xinyu Zhou
Yun Wang
Liangxin Qian
Jun Zhao
245
2
0
15 May 2025
PWC-MoE: Privacy-Aware Wireless Collaborative Mixture of Experts
PWC-MoE: Privacy-Aware Wireless Collaborative Mixture of Experts
Yang Su
Na Yan
Yansha Deng
Robert Schober
MoE
122
1
0
13 May 2025
KDH-MLTC: Knowledge Distillation for Healthcare Multi-Label Text Classification
KDH-MLTC: Knowledge Distillation for Healthcare Multi-Label Text Classification
Hajar Sakai
Sarah Lam
VLM
346
0
0
12 May 2025
Towards Artificial General or Personalized Intelligence? A Survey on Foundation Models for Personalized Federated Intelligence
Towards Artificial General or Personalized Intelligence? A Survey on Foundation Models for Personalized Federated Intelligence
Yu Qiao
Huy Q. Le
Avi Deb Raha
Phuong-Nam Tran
Apurba Adhikary
Mengchun Zhang
Loc X. Nguyen
Eui-nam Huh
Zhu Han
Choong Seon Hong
AI4CE
399
5
0
11 May 2025
Gender Bias in Explainability: Investigating Performance Disparity in Post-hoc Methods
Gender Bias in Explainability: Investigating Performance Disparity in Post-hoc MethodsConference on Fairness, Accountability and Transparency (FAccT), 2025
Mahdi Dhaini
Ege Erdogan
Nils Feldhus
Gjergji Kasneci
313
1
0
02 May 2025
LLM-Based Threat Detection and Prevention Framework for IoT Ecosystems
LLM-Based Threat Detection and Prevention Framework for IoT Ecosystems
Yazan Otoum
Arghavan Asad
Amiya Nayak
472
12
0
01 May 2025
KETCHUP: K-Step Return Estimation for Sequential Knowledge Distillation
KETCHUP: K-Step Return Estimation for Sequential Knowledge Distillation
Jiabin Fan
Guoqing Luo
Michael Bowling
Lili Mou
OffRL
415
0
0
26 Apr 2025
HMI: Hierarchical Knowledge Management for Efficient Multi-Tenant Inference in Pretrained Language Models
HMI: Hierarchical Knowledge Management for Efficient Multi-Tenant Inference in Pretrained Language ModelsThe VLDB journal (VLDB J.), 2025
Junxuan Zhang
Jiadong Wang
Haoyang Li
Lidan Shou
Ke Chen
Gang Chen
Qin Xie
Guiming Xie
Xuejian Gong
188
1
0
24 Apr 2025
A Survey of Foundation Model-Powered Recommender Systems: From Feature-Based, Generative to Agentic Paradigms
A Survey of Foundation Model-Powered Recommender Systems: From Feature-Based, Generative to Agentic Paradigms
Chengkai Huang
Hongtao Huang
Tong Yu
Kaige Xie
Junda Wu
Shuai Zhang
Julian McAuley
Dietmar Jannach
Lina Yao
LRMAI4CE
294
7
0
23 Apr 2025
Honey, I Shrunk the Language Model: Impact of Knowledge Distillation Methods on Performance and Explainability
Honey, I Shrunk the Language Model: Impact of Knowledge Distillation Methods on Performance and Explainability
Daniel Hendriks
Philipp Spitzer
Niklas Kühl
G. Satzger
357
2
0
22 Apr 2025
DistilQwen2.5: Industrial Practices of Training Distilled Open Lightweight Language Models
DistilQwen2.5: Industrial Practices of Training Distilled Open Lightweight Language Models
Chengyu Wang
Junbing Yan
Yuanhao Yue
Yanjie Liang
215
4
0
21 Apr 2025
Knowledge Distillation and Dataset Distillation of Large Language Models: Emerging Trends, Challenges, and Future Directions
Knowledge Distillation and Dataset Distillation of Large Language Models: Emerging Trends, Challenges, and Future Directions
Luyang Fang
Xiaowei Yu
Jianfeng Cai
Yongkai Chen
Shushan Wu
...
Wenxuan Zhong
Tianming Liu
Ping Ma
Tianming Liu
Ping Ma
ALM
256
13
0
20 Apr 2025
Empirical Evaluation of Knowledge Distillation from Transformers to Subquadratic Language Models
Empirical Evaluation of Knowledge Distillation from Transformers to Subquadratic Language Models
Patrick Haller
Jonas Golde
Alan Akbik
377
1
0
19 Apr 2025
Previous
12345...202122
Next