ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1909.10351
  4. Cited By
TinyBERT: Distilling BERT for Natural Language Understanding
v1v2v3v4v5 (latest)

TinyBERT: Distilling BERT for Natural Language Understanding

Findings (Findings), 2019
23 September 2019
Xiaoqi Jiao
Yichun Yin
Lifeng Shang
Xin Jiang
Xiao Chen
Linlin Li
F. Wang
Qun Liu
    VLM
ArXiv (abs)PDFHTML

Papers citing "TinyBERT: Distilling BERT for Natural Language Understanding"

50 / 1,055 papers shown
Title
TransFR: Transferable Federated Recommendation with Adapter Tuning on Pre-trained Language Models
TransFR: Transferable Federated Recommendation with Adapter Tuning on Pre-trained Language Models
Honglei Zhang
Zhiwei Li
Haoxuan Li
Xin Zhou
J. Zhang
Yidong Li
209
4
0
14 Jan 2026
Art2Music: Generating Music for Art Images with Multi-modal Feeling Alignment
Art2Music: Generating Music for Art Images with Multi-modal Feeling Alignment
Jiaying Hong
Ting Zhu
Thanet Markchom
Huizhi Liang
24
0
0
27 Nov 2025
Deterministic Continuous Replacement: Fast and Stable Module Replacement in Pretrained Transformers
Deterministic Continuous Replacement: Fast and Stable Module Replacement in Pretrained Transformers
Rowan Bradbury
Aniket Srinivasan Ashok
Sai Ram Kasanagottu
Gunmay Jhingran
Shuai Meng
145
0
0
24 Nov 2025
A Systematic Study of Compression Ordering for Large Language Models
A Systematic Study of Compression Ordering for Large Language Models
Shivansh Chhawri
Rahul Mahadik
Suparna Rooj
MQ
108
0
0
23 Nov 2025
When Structure Doesn't Help: LLMs Do Not Read Text-Attributed Graphs as Effectively as We Expected
When Structure Doesn't Help: LLMs Do Not Read Text-Attributed Graphs as Effectively as We Expected
Haotian Xu
Yuning You
Tengfei Ma
104
0
0
20 Nov 2025
From Low-Rank Features to Encoding Mismatch: Rethinking Feature Distillation in Vision Transformers
From Low-Rank Features to Encoding Mismatch: Rethinking Feature Distillation in Vision Transformers
Huiyuan Tian
Bonan Xu
Shijian Li
Xin Jin
95
0
0
19 Nov 2025
Dynamic Temperature Scheduler for Knowledge Distillation
Dynamic Temperature Scheduler for Knowledge Distillation
Sibgat Ul Islam
Jawad Ibn Ahad
Fuad Rahman
M. R. Amin
Nabeel Mohammed
Shafin Rahman
93
0
0
14 Nov 2025
CleverBirds: A Multiple-Choice Benchmark for Fine-grained Human Knowledge Tracing
CleverBirds: A Multiple-Choice Benchmark for Fine-grained Human Knowledge Tracing
Leonie Bossemeyer
Samuel Heinrich
Grant Van Horn
Oisin Mac Aodha
84
0
0
11 Nov 2025
Generalizable Blood Cell Detection via Unified Dataset and Faster R-CNN
Generalizable Blood Cell Detection via Unified Dataset and Faster R-CNN
Siddharth Sahay
264
0
0
11 Nov 2025
Two Heads are Better than One: Distilling Large Language Model Features Into Small Models with Feature Decomposition and Mixture
Two Heads are Better than One: Distilling Large Language Model Features Into Small Models with Feature Decomposition and Mixture
Tianhao Fu
Xinxin Xu
Weichen Xu
Jue Chen
Ruilong Ren
Bowen Deng
Xinyu Zhao
Jian Cao
Xixin Cao
91
2
0
10 Nov 2025
MobileLLM-Pro Technical Report
MobileLLM-Pro Technical Report
Patrick Huber
Ernie Chang
Wei Wen
Igor Fedorov
Tarek Elgamal
...
Vikas Chandra
Ahmed Aly
Anuj Kumar
Raghuraman Krishnamoorthi
Adithya Sagar
124
0
0
10 Nov 2025
CAMP-HiVe: Cyclic Pair Merging based Efficient DNN Pruning with Hessian-Vector Approximation for Resource-Constrained Systems
CAMP-HiVe: Cyclic Pair Merging based Efficient DNN Pruning with Hessian-Vector Approximation for Resource-Constrained Systems
M. H. Uddin
Sai Krishna Ghanta
Liam Seymour
S. Baidya
195
0
0
09 Nov 2025
A Metamorphic Testing Perspective on Knowledge Distillation for Language Models of Code: Does the Student Deeply Mimic the Teacher?
A Metamorphic Testing Perspective on Knowledge Distillation for Language Models of Code: Does the Student Deeply Mimic the Teacher?
Md. Abdul Awal
Mrigank Rochan
Chanchal K. Roy
153
0
0
07 Nov 2025
Thinking with DistilQwen: A Tale of Four Distilled Reasoning and Reward Model Series
Thinking with DistilQwen: A Tale of Four Distilled Reasoning and Reward Model Series
Wenrui Cai
Chengyu Wang
Junbing Yan
Jun Huang
Xiangzhong Fang
LRMVLM
149
0
0
03 Nov 2025
Reviving Stale Updates: Data-Free Knowledge Distillation for Asynchronous Federated Learning
Reviving Stale Updates: Data-Free Knowledge Distillation for Asynchronous Federated Learning
Baris Askin
Holger Roth
Zhenyu Sun
Carlee Joe-Wong
Gauri Joshi
Ziyue Xu
FedML
200
0
0
01 Nov 2025
Distilling Multilingual Vision-Language Models: When Smaller Models Stay Multilingual
Distilling Multilingual Vision-Language Models: When Smaller Models Stay Multilingual
Sukrit Sriratanawilai
Jhayahgrit Thongwat
Romrawin Chumpu
Patomporn Payoungkhamdee
Sarana Nutanong
Peerat Limkonchotiwat
VLM
142
0
0
30 Oct 2025
Elastic Architecture Search for Efficient Language Models
Elastic Architecture Search for Efficient Language ModelsIEEE International Conference on Multimedia and Expo (ICME), 2025
Shang Wang
KELM
108
0
0
30 Oct 2025
FakeZero: Real-Time, Privacy-Preserving Misinformation Detection for Facebook and X
FakeZero: Real-Time, Privacy-Preserving Misinformation Detection for Facebook and X
Soufiane Essahli
Oussama Sarsar
Imane Fouad
Anas Motii
Ahmed Bentajer
113
1
0
29 Oct 2025
SwiftEmbed: Ultra-Fast Text Embeddings via Static Token Lookup for Real-Time Applications
SwiftEmbed: Ultra-Fast Text Embeddings via Static Token Lookup for Real-Time Applications
Edouard Lansiaux
Antoine Simonet
Eric Wiel
125
0
0
27 Oct 2025
FastVLM: Self-Speculative Decoding for Fast Vision-Language Model Inference
FastVLM: Self-Speculative Decoding for Fast Vision-Language Model Inference
Divya J. Bajpai
M. Hanawal
MLLMVLM
208
0
0
26 Oct 2025
SindBERT, the Sailor: Charting the Seas of Turkish NLP
SindBERT, the Sailor: Charting the Seas of Turkish NLP
Raphael Scheible-Schmitt
Stefan Schweter
81
2
0
24 Oct 2025
Few-Shot Knowledge Distillation of LLMs With Counterfactual Explanations
Few-Shot Knowledge Distillation of LLMs With Counterfactual Explanations
Faisal Hamman
Pasan Dissanayake
Yanjun Fu
Sanghamitra Dutta
132
1
0
24 Oct 2025
TernaryCLIP: Efficiently Compressing Vision-Language Models with Ternary Weights and Distilled Knowledge
TernaryCLIP: Efficiently Compressing Vision-Language Models with Ternary Weights and Distilled Knowledge
Shu-Hao Zhang
Wei Tang
Chen Wu
Peng Hu
Nan Li
L. Zhang
Qi Zhang
Shao-Qun Zhang
MQVLM
253
0
0
23 Oct 2025
Amplifying Prominent Representations in Multimodal Learning via Variational Dirichlet Process
Amplifying Prominent Representations in Multimodal Learning via Variational Dirichlet Process
Tsai Hor Chan
Feng Wu
Yihang Chen
Guosheng Yin
Lequan Yu
153
0
0
23 Oct 2025
Mixture of Experts Approaches in Dense Retrieval Tasks
Mixture of Experts Approaches in Dense Retrieval Tasks
Effrosyni Sokli
Pranav Kasela
Georgios Peikos
G. Pasi
MoE
164
0
0
17 Oct 2025
A Survey on Collaborating Small and Large Language Models for Performance, Cost-effectiveness, Cloud-edge Privacy, and Trustworthiness
A Survey on Collaborating Small and Large Language Models for Performance, Cost-effectiveness, Cloud-edge Privacy, and Trustworthiness
Fali Wang
Jihai Chen
Shuhua Yang
Ali Al-Lawati
Linli Tang
Hui Liu
Suhang Wang
167
2
0
14 Oct 2025
CompoDistill: Attention Distillation for Compositional Reasoning in Multimodal LLMs
CompoDistill: Attention Distillation for Compositional Reasoning in Multimodal LLMs
Jiwan Kim
Kibum Kim
Sangwoo Seo
Chanyoung Park
VLM
144
1
0
14 Oct 2025
Efficient Adaptive Transformer: An Empirical Study and Reproducible Framework
Efficient Adaptive Transformer: An Empirical Study and Reproducible Framework
Jan Miller
100
0
0
14 Oct 2025
Where to Begin: Efficient Pretraining via Subnetwork Selection and Distillation
Where to Begin: Efficient Pretraining via Subnetwork Selection and Distillation
Arjun Krishnakumar
R. Sukthanker
Hannan Javed Mahadik
Gabriela Kadlecová
Vladyslav Moroshan
Timur Carstensen
Frank Hutter
Aaron Klein
125
0
0
08 Oct 2025
GUIDE: Guided Initialization and Distillation of Embeddings
GUIDE: Guided Initialization and Distillation of Embeddings
Khoa Trinh
Gaurav Menghani
Erik Vee
116
0
0
07 Oct 2025
Downsized and Compromised?: Assessing the Faithfulness of Model Compression
Downsized and Compromised?: Assessing the Faithfulness of Model Compression
Moumita Kamal
Douglas A. Talbert
104
0
0
07 Oct 2025
Learning from All: Concept Alignment for Autonomous Distillation from Multiple Drifting MLLMs
Learning from All: Concept Alignment for Autonomous Distillation from Multiple Drifting MLLMs
Xiaoyu Yang
Jie Lu
En Yu
LRM
128
2
0
05 Oct 2025
Layer-wise dynamic rank for compressing large language models
Layer-wise dynamic rank for compressing large language models
Zhendong Mi
Bian Sun
Grace Li Zhang
Shaoyi Huang
ALM
199
0
0
30 Sep 2025
CURA: Size Isnt All You Need - A Compact Universal Architecture for On-Device Intelligence
CURA: Size Isnt All You Need - A Compact Universal Architecture for On-Device Intelligence
Jae-Bum Seo
Muhammad Salman
Lismer Andres Caceres-Najarro
92
0
0
29 Sep 2025
Knowledge distillation through geometry-aware representational alignment
Knowledge distillation through geometry-aware representational alignment
Prajjwal Bhattarai
Mohammad Amjad
Dmytro Zhylko
Tuka Alhanai
161
0
0
27 Sep 2025
RestoRect: Degraded Image Restoration via Latent Rectified Flow & Feature Distillation
RestoRect: Degraded Image Restoration via Latent Rectified Flow & Feature Distillation
Shourya Verma
Mengbo Wang
Nadia Atallah Lanman
Ananth Grama
116
0
0
27 Sep 2025
MonoCon: A general framework for learning ultra-compact high-fidelity representations using monotonicity constraints
MonoCon: A general framework for learning ultra-compact high-fidelity representations using monotonicity constraints
Shreyas Gokhale
98
0
0
26 Sep 2025
COSPADI: Compressing LLMs via Calibration-Guided Sparse Dictionary Learning
COSPADI: Compressing LLMs via Calibration-Guided Sparse Dictionary Learning
Dmitriy Shopkhoev
Denis Makhov
Magauiya Zhussip
Ammar Ali
Stamatios Lefkimmiatis
181
0
0
26 Sep 2025
Progressive Weight Loading: Accelerating Initial Inference and Gradually Boosting Performance on Resource-Constrained Environments
Progressive Weight Loading: Accelerating Initial Inference and Gradually Boosting Performance on Resource-Constrained Environments
Hyunwoo Kim
Junha Lee
M. Choi
J. Lee
Jaeshin Cho
VLM
130
0
0
26 Sep 2025
Otters: An Energy-Efficient SpikingTransformer via Optical Time-to-First-Spike Encoding
Otters: An Energy-Efficient SpikingTransformer via Optical Time-to-First-Spike Encoding
Zhanglu Yan
Jiayi Mao
Qianhui Liu
Fanfan Li
Gang Pan
Tao Luo
Bowen Zhu
Weng-Fai Wong
128
1
0
23 Sep 2025
Flatness is Necessary, Neural Collapse is Not: Rethinking Generalization via Grokking
Flatness is Necessary, Neural Collapse is Not: Rethinking Generalization via Grokking
T. Han
Linara Adilova
Henning Petzka
Jens Kleesiek
Michael Kamp
231
1
0
22 Sep 2025
LEAF: Knowledge Distillation of Text Embedding Models with Teacher-Aligned Representations
LEAF: Knowledge Distillation of Text Embedding Models with Teacher-Aligned Representations
Robin Vujanic
Thomas Rueckstiess
104
2
0
16 Sep 2025
A Transformer-Based Cross-Platform Analysis of Public Discourse on the 15-Minute City Paradigm
A Transformer-Based Cross-Platform Analysis of Public Discourse on the 15-Minute City Paradigm
Gaurab Chhetri
Darrell Anderson
Boniphace Kutela
Subasish Das
62
0
0
14 Sep 2025
Mitigating Attention Localization in Small Scale: Self-Attention Refinement via One-step Belief Propagation
Mitigating Attention Localization in Small Scale: Self-Attention Refinement via One-step Belief Propagation
Nakyung Lee
Yeongoon Kim
Minhae Oh
Suhwan Kim
Jin Woo Koo
Hyewon Jo
Jungwoo Lee
118
1
0
09 Sep 2025
NoteBar: An AI-Assisted Note-Taking System for Personal Knowledge Management
NoteBar: An AI-Assisted Note-Taking System for Personal Knowledge Management
Josh Wisoff
Yao Tang
Zhengyu Fang
Jordan Guzman
YuTang Wang
Alex Yu
100
0
0
03 Sep 2025
Efficient Large Language Models with Zero-Shot Adjustable Acceleration
Efficient Large Language Models with Zero-Shot Adjustable Acceleration
Sajjad Kachuee
M. Sharifkhani
158
0
0
01 Sep 2025
Spatio-Temporal Pruning for Compressed Spiking Large Language Models
Spatio-Temporal Pruning for Compressed Spiking Large Language Models
Yi Jiang
Malyaban Bal
Brian Matejek
Susmit Jha
Adam D. Cobb
Abhronil Sengupta
84
0
0
23 Aug 2025
AMMKD: Adaptive Multimodal Multi-teacher Distillation for Lightweight Vision-Language Models
AMMKD: Adaptive Multimodal Multi-teacher Distillation for Lightweight Vision-Language Models
Yuqi Li
Chuanguang Yang
Junhao Dong
Zhengtao Yao
Haoyan Xu
Zeyu Dong
Hansheng Zeng
Zhulin An
Yingli Tian
VLM
117
5
0
23 Aug 2025
A Multimodal-Multitask Framework with Cross-modal Relation and Hierarchical Interactive Attention for Semantic Comprehension
A Multimodal-Multitask Framework with Cross-modal Relation and Hierarchical Interactive Attention for Semantic ComprehensionInformation Fusion (Inf. Fusion), 2025
Mohammad Zia Ur Rehman
Devraj Raghuvanshi
Umang Jain
Shubhi Bansal
Nagendra Kumar
104
5
0
22 Aug 2025
Expandable Residual Approximation for Knowledge Distillation
Expandable Residual Approximation for Knowledge DistillationIEEE Transactions on Neural Networks and Learning Systems (IEEE TNNLS), 2025
Zhaoyi Yan
Binghui Chen
Yunfan Liu
Qixiang Ye
CLL
117
0
0
22 Aug 2025
1234...202122
Next