ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1909.10351
  4. Cited By
TinyBERT: Distilling BERT for Natural Language Understanding
v1v2v3v4v5 (latest)

TinyBERT: Distilling BERT for Natural Language Understanding

Findings (Findings), 2019
23 September 2019
Xiaoqi Jiao
Yichun Yin
Lifeng Shang
Xin Jiang
Xiao Chen
Linlin Li
F. Wang
Qun Liu
    VLM
ArXiv (abs)PDFHTML

Papers citing "TinyBERT: Distilling BERT for Natural Language Understanding"

50 / 1,055 papers shown
Title
Art2Music: Generating Music for Art Images with Multi-modal Feeling Alignment
Art2Music: Generating Music for Art Images with Multi-modal Feeling Alignment
Jiaying Hong
Ting Zhu
Thanet Markchom
Huizhi Liang
16
0
0
27 Nov 2025
Deterministic Continuous Replacement: Fast and Stable Module Replacement in Pretrained Transformers
Deterministic Continuous Replacement: Fast and Stable Module Replacement in Pretrained Transformers
Rowan Bradbury
Aniket Srinivasan Ashok
Sai Ram Kasanagottu
Gunmay Jhingran
Shuai Meng
133
0
0
24 Nov 2025
A Systematic Study of Compression Ordering for Large Language Models
A Systematic Study of Compression Ordering for Large Language Models
Shivansh Chhawri
Rahul Mahadik
Suparna Rooj
MQ
80
0
0
23 Nov 2025
When Structure Doesn't Help: LLMs Do Not Read Text-Attributed Graphs as Effectively as We Expected
When Structure Doesn't Help: LLMs Do Not Read Text-Attributed Graphs as Effectively as We Expected
Haotian Xu
Yuning You
Tengfei Ma
92
0
0
20 Nov 2025
From Low-Rank Features to Encoding Mismatch: Rethinking Feature Distillation in Vision Transformers
From Low-Rank Features to Encoding Mismatch: Rethinking Feature Distillation in Vision Transformers
Huiyuan Tian
Bonan Xu
Shijian Li
Xin Jin
64
0
0
19 Nov 2025
Dynamic Temperature Scheduler for Knowledge Distillation
Dynamic Temperature Scheduler for Knowledge Distillation
Sibgat Ul Islam
Jawad Ibn Ahad
Fuad Rahman
M. R. Amin
Nabeel Mohammed
Shafin Rahman
81
0
0
14 Nov 2025
Generalizable Blood Cell Detection via Unified Dataset and Faster R-CNN
Generalizable Blood Cell Detection via Unified Dataset and Faster R-CNN
Siddharth Sahay
220
0
0
11 Nov 2025
CleverBirds: A Multiple-Choice Benchmark for Fine-grained Human Knowledge Tracing
CleverBirds: A Multiple-Choice Benchmark for Fine-grained Human Knowledge Tracing
Leonie Bossemeyer
Samuel Heinrich
Grant Van Horn
Oisin Mac Aodha
84
0
0
11 Nov 2025
Two Heads are Better than One: Distilling Large Language Model Features Into Small Models with Feature Decomposition and Mixture
Two Heads are Better than One: Distilling Large Language Model Features Into Small Models with Feature Decomposition and Mixture
Tianhao Fu
Xinxin Xu
Weichen Xu
Jue Chen
Ruilong Ren
Bowen Deng
Xinyu Zhao
Jian Cao
Xixin Cao
79
2
0
10 Nov 2025
MobileLLM-Pro Technical Report
MobileLLM-Pro Technical Report
Patrick Huber
Ernie Chang
Wei Wen
Igor Fedorov
Tarek Elgamal
...
Vikas Chandra
Ahmed Aly
Anuj Kumar
Raghuraman Krishnamoorthi
Adithya Sagar
88
0
0
10 Nov 2025
CAMP-HiVe: Cyclic Pair Merging based Efficient DNN Pruning with Hessian-Vector Approximation for Resource-Constrained Systems
CAMP-HiVe: Cyclic Pair Merging based Efficient DNN Pruning with Hessian-Vector Approximation for Resource-Constrained Systems
M. H. Uddin
Sai Krishna Ghanta
Liam Seymour
S. Baidya
179
0
0
09 Nov 2025
A Metamorphic Testing Perspective on Knowledge Distillation for Language Models of Code: Does the Student Deeply Mimic the Teacher?
A Metamorphic Testing Perspective on Knowledge Distillation for Language Models of Code: Does the Student Deeply Mimic the Teacher?
Md. Abdul Awal
Mrigank Rochan
Chanchal K. Roy
128
0
0
07 Nov 2025
Thinking with DistilQwen: A Tale of Four Distilled Reasoning and Reward Model Series
Thinking with DistilQwen: A Tale of Four Distilled Reasoning and Reward Model Series
Wenrui Cai
Chengyu Wang
Junbing Yan
Jun Huang
Xiangzhong Fang
LRMVLM
125
0
0
03 Nov 2025
Reviving Stale Updates: Data-Free Knowledge Distillation for Asynchronous Federated Learning
Reviving Stale Updates: Data-Free Knowledge Distillation for Asynchronous Federated Learning
Baris Askin
Holger Roth
Zhenyu Sun
Carlee Joe-Wong
Gauri Joshi
Ziyue Xu
FedML
188
0
0
01 Nov 2025
Elastic Architecture Search for Efficient Language Models
Elastic Architecture Search for Efficient Language ModelsIEEE International Conference on Multimedia and Expo (ICME), 2025
Shang Wang
KELM
92
0
0
30 Oct 2025
Distilling Multilingual Vision-Language Models: When Smaller Models Stay Multilingual
Distilling Multilingual Vision-Language Models: When Smaller Models Stay Multilingual
Sukrit Sriratanawilai
Jhayahgrit Thongwat
Romrawin Chumpu
Patomporn Payoungkhamdee
Sarana Nutanong
Peerat Limkonchotiwat
VLM
130
0
0
30 Oct 2025
FakeZero: Real-Time, Privacy-Preserving Misinformation Detection for Facebook and X
FakeZero: Real-Time, Privacy-Preserving Misinformation Detection for Facebook and X
Soufiane Essahli
Oussama Sarsar
Imane Fouad
Anas Motii
Ahmed Bentajer
113
0
0
29 Oct 2025
SwiftEmbed: Ultra-Fast Text Embeddings via Static Token Lookup for Real-Time Applications
SwiftEmbed: Ultra-Fast Text Embeddings via Static Token Lookup for Real-Time Applications
Edouard Lansiaux
89
0
0
27 Oct 2025
FastVLM: Self-Speculative Decoding for Fast Vision-Language Model Inference
FastVLM: Self-Speculative Decoding for Fast Vision-Language Model Inference
Divya J. Bajpai
M. Hanawal
MLLMVLM
198
0
0
26 Oct 2025
SindBERT, the Sailor: Charting the Seas of Turkish NLP
SindBERT, the Sailor: Charting the Seas of Turkish NLP
Raphael Scheible-Schmitt
Stefan Schweter
72
0
0
24 Oct 2025
Few-Shot Knowledge Distillation of LLMs With Counterfactual Explanations
Few-Shot Knowledge Distillation of LLMs With Counterfactual Explanations
Faisal Hamman
Pasan Dissanayake
Yanjun Fu
Sanghamitra Dutta
120
1
0
24 Oct 2025
Amplifying Prominent Representations in Multimodal Learning via Variational Dirichlet Process
Amplifying Prominent Representations in Multimodal Learning via Variational Dirichlet Process
Tsai Hor Chan
Feng Wu
Yihang Chen
Guosheng Yin
Lequan Yu
137
0
0
23 Oct 2025
TernaryCLIP: Efficiently Compressing Vision-Language Models with Ternary Weights and Distilled Knowledge
TernaryCLIP: Efficiently Compressing Vision-Language Models with Ternary Weights and Distilled Knowledge
Shu-Hao Zhang
Wei Tang
Chen Wu
Peng Hu
Nan Li
L. Zhang
Qi Zhang
Shao-Qun Zhang
MQVLM
235
0
0
23 Oct 2025
Mixture of Experts Approaches in Dense Retrieval Tasks
Mixture of Experts Approaches in Dense Retrieval Tasks
Effrosyni Sokli
Pranav Kasela
Georgios Peikos
G. Pasi
MoE
148
0
0
17 Oct 2025
A Survey on Collaborating Small and Large Language Models for Performance, Cost-effectiveness, Cloud-edge Privacy, and Trustworthiness
A Survey on Collaborating Small and Large Language Models for Performance, Cost-effectiveness, Cloud-edge Privacy, and Trustworthiness
Fali Wang
Jihai Chen
Shuhua Yang
Ali Al-Lawati
Linli Tang
Hui Liu
Suhang Wang
155
2
0
14 Oct 2025
CompoDistill: Attention Distillation for Compositional Reasoning in Multimodal LLMs
CompoDistill: Attention Distillation for Compositional Reasoning in Multimodal LLMs
Jiwan Kim
Kibum Kim
Sangwoo Seo
Chanyoung Park
VLM
140
0
0
14 Oct 2025
Efficient Adaptive Transformer: An Empirical Study and Reproducible Framework
Efficient Adaptive Transformer: An Empirical Study and Reproducible Framework
Jan Miller
84
0
0
14 Oct 2025
Where to Begin: Efficient Pretraining via Subnetwork Selection and Distillation
Where to Begin: Efficient Pretraining via Subnetwork Selection and Distillation
Arjun Krishnakumar
R. Sukthanker
Hannan Javed Mahadik
Gabriela Kadlecová
Vladyslav Moroshan
Timur Carstensen
Frank Hutter
Aaron Klein
105
0
0
08 Oct 2025
GUIDE: Guided Initialization and Distillation of Embeddings
GUIDE: Guided Initialization and Distillation of Embeddings
Khoa Trinh
Gaurav Menghani
Erik Vee
96
0
0
07 Oct 2025
Downsized and Compromised?: Assessing the Faithfulness of Model Compression
Downsized and Compromised?: Assessing the Faithfulness of Model Compression
Moumita Kamal
Douglas A. Talbert
100
0
0
07 Oct 2025
Learning from All: Concept Alignment for Autonomous Distillation from Multiple Drifting MLLMs
Learning from All: Concept Alignment for Autonomous Distillation from Multiple Drifting MLLMs
Xiaoyu Yang
Jie Lu
En Yu
LRM
116
2
0
05 Oct 2025
Layer-wise dynamic rank for compressing large language models
Layer-wise dynamic rank for compressing large language models
Zhendong Mi
Bian Sun
Grace Li Zhang
Shaoyi Huang
ALM
144
0
0
30 Sep 2025
CURA: Size Isnt All You Need - A Compact Universal Architecture for On-Device Intelligence
CURA: Size Isnt All You Need - A Compact Universal Architecture for On-Device Intelligence
Jae-Bum Seo
Muhammad Salman
Lismer Andres Caceres-Najarro
76
0
0
29 Sep 2025
Knowledge distillation through geometry-aware representational alignment
Knowledge distillation through geometry-aware representational alignment
Prajjwal Bhattarai
Mohammad Amjad
Dmytro Zhylko
Tuka Alhanai
140
0
0
27 Sep 2025
RestoRect: Degraded Image Restoration via Latent Rectified Flow & Feature Distillation
RestoRect: Degraded Image Restoration via Latent Rectified Flow & Feature Distillation
Shourya Verma
Mengbo Wang
Nadia Atallah Lanman
Ananth Grama
100
0
0
27 Sep 2025
MonoCon: A general framework for learning ultra-compact high-fidelity representations using monotonicity constraints
MonoCon: A general framework for learning ultra-compact high-fidelity representations using monotonicity constraints
Shreyas Gokhale
92
0
0
26 Sep 2025
COSPADI: Compressing LLMs via Calibration-Guided Sparse Dictionary Learning
COSPADI: Compressing LLMs via Calibration-Guided Sparse Dictionary Learning
Dmitriy Shopkhoev
Denis Makhov
Magauiya Zhussip
Ammar Ali
Stamatios Lefkimmiatis
181
0
0
26 Sep 2025
Progressive Weight Loading: Accelerating Initial Inference and Gradually Boosting Performance on Resource-Constrained Environments
Progressive Weight Loading: Accelerating Initial Inference and Gradually Boosting Performance on Resource-Constrained Environments
Hyunwoo Kim
Junha Lee
M. Choi
J. Lee
Jaeshin Cho
VLM
106
0
0
26 Sep 2025
Otters: An Energy-Efficient SpikingTransformer via Optical Time-to-First-Spike Encoding
Otters: An Energy-Efficient SpikingTransformer via Optical Time-to-First-Spike Encoding
Zhanglu Yan
Jiayi Mao
Qianhui Liu
Fanfan Li
Gang Pan
Tao Luo
Bowen Zhu
Weng-Fai Wong
116
1
0
23 Sep 2025
Flatness is Necessary, Neural Collapse is Not: Rethinking Generalization via Grokking
Flatness is Necessary, Neural Collapse is Not: Rethinking Generalization via Grokking
T. Han
Linara Adilova
Henning Petzka
Jens Kleesiek
Michael Kamp
201
1
0
22 Sep 2025
LEAF: Knowledge Distillation of Text Embedding Models with Teacher-Aligned Representations
LEAF: Knowledge Distillation of Text Embedding Models with Teacher-Aligned Representations
Robin Vujanic
Thomas Rueckstiess
104
2
0
16 Sep 2025
A Transformer-Based Cross-Platform Analysis of Public Discourse on the 15-Minute City Paradigm
A Transformer-Based Cross-Platform Analysis of Public Discourse on the 15-Minute City Paradigm
Gaurab Chhetri
Darrell Anderson
Boniphace Kutela
Subasish Das
46
0
0
14 Sep 2025
Mitigating Attention Localization in Small Scale: Self-Attention Refinement via One-step Belief Propagation
Mitigating Attention Localization in Small Scale: Self-Attention Refinement via One-step Belief Propagation
Nakyung Lee
Yeongoon Kim
Minhae Oh
Suhwan Kim
Jin Woo Koo
Hyewon Jo
Jungwoo Lee
70
1
0
09 Sep 2025
NoteBar: An AI-Assisted Note-Taking System for Personal Knowledge Management
NoteBar: An AI-Assisted Note-Taking System for Personal Knowledge Management
Josh Wisoff
Yao Tang
Zhengyu Fang
Jordan Guzman
YuTang Wang
Alex Yu
92
0
0
03 Sep 2025
Efficient Large Language Models with Zero-Shot Adjustable Acceleration
Efficient Large Language Models with Zero-Shot Adjustable Acceleration
Sajjad Kachuee
M. Sharifkhani
158
0
0
01 Sep 2025
Spatio-Temporal Pruning for Compressed Spiking Large Language Models
Spatio-Temporal Pruning for Compressed Spiking Large Language Models
Yi Jiang
Malyaban Bal
Brian Matejek
Susmit Jha
Adam D. Cobb
Abhronil Sengupta
68
0
0
23 Aug 2025
AMMKD: Adaptive Multimodal Multi-teacher Distillation for Lightweight Vision-Language Models
AMMKD: Adaptive Multimodal Multi-teacher Distillation for Lightweight Vision-Language Models
Yuqi Li
Chuanguang Yang
Junhao Dong
Zhengtao Yao
Haoyan Xu
Zeyu Dong
Hansheng Zeng
Zhulin An
Yingli Tian
VLM
105
5
0
23 Aug 2025
A Multimodal-Multitask Framework with Cross-modal Relation and Hierarchical Interactive Attention for Semantic Comprehension
A Multimodal-Multitask Framework with Cross-modal Relation and Hierarchical Interactive Attention for Semantic ComprehensionInformation Fusion (Inf. Fusion), 2025
Mohammad Zia Ur Rehman
Devraj Raghuvanshi
Umang Jain
Shubhi Bansal
Nagendra Kumar
104
5
0
22 Aug 2025
Expandable Residual Approximation for Knowledge Distillation
Expandable Residual Approximation for Knowledge DistillationIEEE Transactions on Neural Networks and Learning Systems (IEEE TNNLS), 2025
Zhaoyi Yan
Binghui Chen
Yunfan Liu
Qixiang Ye
CLL
113
0
0
22 Aug 2025
An Empirical Study of Knowledge Distillation for Code Understanding Tasks
An Empirical Study of Knowledge Distillation for Code Understanding Tasks
Ruiqi Wang
Zezhou Yang
Cuiyun Gao
Xin Xia
Qing Liao
100
1
0
21 Aug 2025
1234...202122
Next