ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1909.10351
  4. Cited By
TinyBERT: Distilling BERT for Natural Language Understanding
v1v2v3v4v5 (latest)

TinyBERT: Distilling BERT for Natural Language Understanding

Findings (Findings), 2019
23 September 2019
Xiaoqi Jiao
Yichun Yin
Lifeng Shang
Xin Jiang
Xiao Chen
Linlin Li
F. Wang
Qun Liu
    VLM
ArXiv (abs)PDFHTML

Papers citing "TinyBERT: Distilling BERT for Natural Language Understanding"

50 / 1,055 papers shown
Title
A Dual-Space Framework for General Knowledge Distillation of Large Language Models
A Dual-Space Framework for General Knowledge Distillation of Large Language Models
Wei Wei
Songming Zhang
Yunlong Liang
Fandong Meng
Yufeng Chen
Jinan Xu
Jie Zhou
355
0
0
15 Apr 2025
Multi-Sense Embeddings for Language Models and Knowledge Distillation
Multi-Sense Embeddings for Language Models and Knowledge DistillationAnnual Meeting of the Association for Computational Linguistics (ACL), 2025
Qitong Wang
Mohammed J. Zaki
Georgios Kollias
Vasileios Kalantzis
KELM
245
2
0
08 Apr 2025
Saliency-driven Dynamic Token Pruning for Large Language Models
Saliency-driven Dynamic Token Pruning for Large Language Models
Yao Tao
Yehui Tang
Yun Wang
Mingjian Zhu
Hailin Hu
Yunhe Wang
445
4
0
06 Apr 2025
Multimodal Fusion and Vision-Language Models: A Survey for Robot Vision
Multimodal Fusion and Vision-Language Models: A Survey for Robot VisionInformation Fusion (Inf. Fusion), 2025
Xiaofeng Han
Shunpeng Chen
Zenghuang Fu
Zhe Feng
Lue Fan
...
Li Guo
Weiliang Meng
Xiaopeng Zhang
Rongtao Xu
Shibiao Xu
395
37
0
03 Apr 2025
Evidencing Unauthorized Training Data from AI Generated Content using Information Isotopes
Evidencing Unauthorized Training Data from AI Generated Content using Information Isotopes
Qi Tao
Yin Jinhua
Cai Dongqi
Xie Yueqi
Wang Huili
...
Zhou Zhili
Wang Shangguang
Lyu Lingjuan
Huang Yongfeng
Lane Nicholas
269
1
0
24 Mar 2025
Efficient Knowledge Distillation via Curriculum Extraction
Efficient Knowledge Distillation via Curriculum Extraction
Shivam Gupta
Sushrut Karmalkar
324
2
0
21 Mar 2025
Sparse Logit Sampling: Accelerating Knowledge Distillation in LLMs
Sparse Logit Sampling: Accelerating Knowledge Distillation in LLMsAnnual Meeting of the Association for Computational Linguistics (ACL), 2025
Anshumann
Mohd Abbas Zaidi
Akhil Kedia
Jinwoo Ahn
Taehwak Kwon
Kangwook Lee
Haejun Lee
Joohyung Lee
FedML
801
1
0
21 Mar 2025
A Generalist Hanabi Agent
A Generalist Hanabi AgentInternational Conference on Learning Representations (ICLR), 2025
Arjun Vaithilingam Sudhakar
Hadi Nekoei
Mathieu Reymond
Miao Liu
Janarthanan Rajendran
Sarath Chandar
913
1
0
17 Mar 2025
IteRABRe: Iterative Recovery-Aided Block Reduction
Haryo Akbarianto Wibowo
Israfel Salazar
Hideki Tanaka
Masao Utiyama
Alham Fikri Aji
Mary Dabre
262
1
0
08 Mar 2025
SplatPose: Geometry-Aware 6-DoF Pose Estimation from Single RGB Image via 3D Gaussian Splatting
Linqi Yang
Xiongwei Zhao
Qihao Sun
Ke Wang
Ao Chen
Peng Kang
3DGS
303
9
0
07 Mar 2025
Malware Detection at the Edge with Lightweight LLMs: A Performance EvaluationACM Transactions on Internet Technology (TOIT), 2025
Christian Rondanini
B. Carminati
E. Ferrari
Antonio Gaudiano
Ashish Kundu
228
5
0
06 Mar 2025
EPEE: Towards Efficient and Effective Foundation Models in Biomedicine
Zaifu Zhan
Shuang Zhou
Huixue Zhou
Ziqiang Liu
Rui Zhang
237
1
0
03 Mar 2025
FedMentalCare: Towards Privacy-Preserving Fine-Tuned LLMs to Analyze Mental Health Status Using Federated Learning Framework
S M Sarwar
AI4MH
210
5
0
27 Feb 2025
XCOMPS: A Multilingual Benchmark of Conceptual Minimal Pairs
XCOMPS: A Multilingual Benchmark of Conceptual Minimal Pairs
Linyang He
Ercong Nie
Sukru Samet Dindar
Arsalan Firoozi
Adrian Nicolas Florea
...
Haotian Ye
Jonathan R. Brennan
Helmut Schmid
Hinrich Schütze
Nima Mesgarani
290
3
0
27 Feb 2025
"Actionable Help" in Crises: A Novel Dataset and Resource-Efficient Models for Identifying Request and Offer Social Media Posts
"Actionable Help" in Crises: A Novel Dataset and Resource-Efficient Models for Identifying Request and Offer Social Media Posts
Rabindra Lamsal
M. Read
S. Karunasekera
Muhammad Imran
183
0
0
24 Feb 2025
Every Expert Matters: Towards Effective Knowledge Distillation for Mixture-of-Experts Language Models
Every Expert Matters: Towards Effective Knowledge Distillation for Mixture-of-Experts Language Models
Gyeongman Kim
Gyouk Chu
Eunho Yang
MoE
273
0
0
18 Feb 2025
PASER: Post-Training Data Selection for Efficient Pruned Large Language Model Recovery
PASER: Post-Training Data Selection for Efficient Pruned Large Language Model Recovery
Bowei He
Lihao Yin
Hui-Ling Zhen
Xiaokun Zhang
Mingxuan Yuan
Chen Ma
387
2
0
18 Feb 2025
Primus: A Pioneering Collection of Open-Source Datasets for Cybersecurity LLM Training
Primus: A Pioneering Collection of Open-Source Datasets for Cybersecurity LLM Training
Yao-Ching Yu
Tsun-Han Chiang
Cheng-Wei Tsai
Chien-Ming Huang
Wen-Kwang Tsao
354
11
0
16 Feb 2025
Performance Analysis of Traditional VQA Models Under Limited Computational Resources
Jihao Gu
278
1
0
09 Feb 2025
A Framework for Double-Blind Federated Adaptation of Foundation Models
A Framework for Double-Blind Federated Adaptation of Foundation Models
Nurbek Tastan
Karthik Nandakumar
FedML
267
0
0
03 Feb 2025
Fake News Detection After LLM Laundering: Measurement and Explanation
Fake News Detection After LLM Laundering: Measurement and Explanation
Rupak Kumar Das
Jonathan Dodge
544
6
0
29 Jan 2025
Towards Cross-Tokenizer Distillation: the Universal Logit Distillation Loss for LLMs
Towards Cross-Tokenizer Distillation: the Universal Logit Distillation Loss for LLMs
Nicolas Boizard
Kevin El Haddad
C´eline Hudelot
Pierre Colombo
430
26
0
28 Jan 2025
Merino: Entropy-driven Design for Generative Language Models on IoT Devices
Merino: Entropy-driven Design for Generative Language Models on IoT DevicesAAAI Conference on Artificial Intelligence (AAAI), 2024
Youpeng Zhao
Ming Lin
Huadong Tang
Qiang Wu
Jun Wang
365
1
0
28 Jan 2025
Extracting General-use Transformers for Low-resource Languages via Knowledge Distillation
Extracting General-use Transformers for Low-resource Languages via Knowledge Distillation
Jan Christian Blaise Cruz
Alham Fikri Aji
286
2
0
22 Jan 2025
Quantification of Large Language Model Distillation
Quantification of Large Language Model DistillationAnnual Meeting of the Association for Computational Linguistics (ACL), 2025
Sunbowen Lee
Junting Zhou
Chang Ao
Kaige Li
Xinrun Du
...
Hamid Alinejad-Rokny
Min Yang
Yitao Liang
Zhoufutu Wen
Shiwen Ni
264
0
0
22 Jan 2025
GREEN-CODE: Learning to Optimize Energy Efficiency in LLM-based Code Generation
GREEN-CODE: Learning to Optimize Energy Efficiency in LLM-based Code GenerationIEEE/ACM International Symposium on Cluster, Cloud and Internet Computing (CCGrid), 2025
Shashikant Ilager
Lukas Florian Briem
Ivona Brandić
230
0
0
19 Jan 2025
Quantization Meets Reasoning: Exploring LLM Low-Bit Quantization Degradation for Mathematical Reasoning
Quantization Meets Reasoning: Exploring LLM Low-Bit Quantization Degradation for Mathematical Reasoning
Zhen Li
Yupeng Su
Runming Yang
C. Xie
Xiping Hu
Zhongwei Xie
Ngai Wong
Hongxia Yang
MQLRM
678
17
0
06 Jan 2025
Lillama: Large Language Models Compression via Low-Rank Feature Distillation
Lillama: Large Language Models Compression via Low-Rank Feature Distillation
Yaya Sy
Christophe Cerisara
Irina Illina
MQ
290
0
0
31 Dec 2024
MatchMiner-AI: An Open-Source Solution for Cancer Clinical Trial Matching
MatchMiner-AI: An Open-Source Solution for Cancer Clinical Trial Matching
Ethan Cerami
Pavel Trukhanov
Morgan A. Paul
Michael J. Hassett
Irbaz B. Riaz
...
Jad El Masri
Alys Malcolm
Tali Mazor
Ethan Cerami
Kenneth L. Kehl
240
6
0
23 Dec 2024
Knowledge Distillation in RNN-Attention Models for Early Prediction of
  Student Performance
Knowledge Distillation in RNN-Attention Models for Early Prediction of Student PerformanceACM Symposium on Applied Computing (SAC), 2024
Sukrit Leelaluk
Cheng Tang
Valdemar Švábenský
Atsushi Shimada
233
2
0
19 Dec 2024
Deploying Foundation Model Powered Agent Services: A Survey
Deploying Foundation Model Powered Agent Services: A Survey
Wenchao Xu
Jinyu Chen
Peirong Zheng
Xiaoquan Yi
Tianyi Tian
...
Quan Wan
Yining Qi
Yunfeng Fan
Qinliang Su
Xuemin Shen
AI4CE
451
5
0
18 Dec 2024
Lightweight Contenders: Navigating Semi-Supervised Text Mining through
  Peer Collaboration and Self Transcendence
Lightweight Contenders: Navigating Semi-Supervised Text Mining through Peer Collaboration and Self TranscendenceNorth American Chapter of the Association for Computational Linguistics (NAACL), 2024
Qianren Mao
Weifeng Jiang
Qingbin Liu
Chenghua Lin
Qian Li
Xianqing Wen
Jianxin Li
Jinhu Lu
277
0
0
01 Dec 2024
Can bidirectional encoder become the ultimate winner for downstream
  applications of foundation models?
Can bidirectional encoder become the ultimate winner for downstream applications of foundation models?
Lewen Yang
Xuanyu Zhou
Juao Fan
Xinyi Xie
Shengxin Zhu
AI4CE
270
1
0
27 Nov 2024
Dynamic Self-Distillation via Previous Mini-batches for Fine-tuning
  Small Language Models
Dynamic Self-Distillation via Previous Mini-batches for Fine-tuning Small Language Models
Y. Fu
Yin Yu
Xiaotian Han
Runchao Li
Xianxuan Long
Haotian Yu
Pan Li
SyDa
381
0
0
25 Nov 2024
Understanding Generalization of Federated Learning: the Trade-off between Model Stability and Optimization
Understanding Generalization of Federated Learning: the Trade-off between Model Stability and Optimization
Dun Zeng
Zheshun Wu
Shiyu Liu
Yu Pan
Xiaoying Tang
Zenglin Xu
MLTFedML
501
2
0
25 Nov 2024
Is Training Data Quality or Quantity More Impactful to Small Language Model Performance?
Is Training Data Quality or Quantity More Impactful to Small Language Model Performance?
Aryan Sajith
Krishna Chaitanya Rao Kathala
238
4
0
24 Nov 2024
Quantifying Knowledge Distillation Using Partial Information Decomposition
Quantifying Knowledge Distillation Using Partial Information DecompositionInternational Conference on Artificial Intelligence and Statistics (AISTATS), 2024
Pasan Dissanayake
Faisal Hamman
Barproda Halder
Ilia Sucholutsky
Qiuyi Zhang
Sanghamitra Dutta
303
6
0
12 Nov 2024
Over-parameterized Student Model via Tensor Decomposition Boosted
  Knowledge Distillation
Over-parameterized Student Model via Tensor Decomposition Boosted Knowledge DistillationNeural Information Processing Systems (NeurIPS), 2024
Yu-Liang Zhan
Zhong-Yi Lu
Hao Sun
Ze-Feng Gao
242
2
0
10 Nov 2024
Decoupling Dark Knowledge via Block-wise Logit Distillation for
  Feature-level Alignment
Decoupling Dark Knowledge via Block-wise Logit Distillation for Feature-level AlignmentIEEE Transactions on Artificial Intelligence (IEEE TAI), 2024
Chengting Yu
Fengzhao Zhang
Ruizhe Chen
Zuozhu Liu
Shurun Tan
Er-ping Li
Aili Wang
325
5
0
03 Nov 2024
Efficient Deep Learning Infrastructures for Embedded Computing Systems: A Comprehensive Survey and Future Envision
Efficient Deep Learning Infrastructures for Embedded Computing Systems: A Comprehensive Survey and Future EnvisionACM Transactions on Embedded Computing Systems (TECS), 2024
Xiangzhong Luo
Di Liu
Hao Kong
Shuo Huai
Hui Chen
Guochu Xiong
Weichen Liu
213
14
0
03 Nov 2024
Larger models yield better results? Streamlined severity classification
  of ADHD-related concerns using BERT-based knowledge distillation
Larger models yield better results? Streamlined severity classification of ADHD-related concerns using BERT-based knowledge distillationmedRxiv (medRxiv), 2024
Ahmed Akib Jawad Karim
Kazi Hafiz Md. Asad
Md. Golam Rabiul Alam
AI4MH
237
6
0
30 Oct 2024
KD-LoRA: A Hybrid Approach to Efficient Fine-Tuning with LoRA and
  Knowledge Distillation
KD-LoRA: A Hybrid Approach to Efficient Fine-Tuning with LoRA and Knowledge Distillation
Rambod Azimi
Rishav Rishav
M. Teichmann
Samira Ebrahimi Kahou
ALM
304
3
0
28 Oct 2024
A Little Help Goes a Long Way: Efficient LLM Training by Leveraging
  Small LMs
A Little Help Goes a Long Way: Efficient LLM Training by Leveraging Small LMs
A. S. Rawat
Veeranjaneyulu Sadhanala
Afshin Rostamizadeh
Ayan Chakrabarti
Wittawat Jitkrittum
...
Rakesh Shivanna
Sashank J. Reddi
A. Menon
Rohan Anil
Sanjiv Kumar
453
10
0
24 Oct 2024
Pre-training Distillation for Large Language Models: A Design Space
  Exploration
Pre-training Distillation for Large Language Models: A Design Space ExplorationAnnual Meeting of the Association for Computational Linguistics (ACL), 2024
Hao Peng
Xin Lv
Yushi Bai
Zijun Yao
Jing Zhang
Lei Hou
Juanzi Li
263
8
0
21 Oct 2024
SHAKTI: A 2.5 Billion Parameter Small Language Model Optimized for Edge AI and Low-Resource Environments
SHAKTI: A 2.5 Billion Parameter Small Language Model Optimized for Edge AI and Low-Resource EnvironmentsArtificial Intelligence Applications and Innovations (AIAI), 2024
Syed Abdul Gaffar Shakhadri
Kruthika KR
Rakshit Aralimatti
VLM
184
3
0
15 Oct 2024
Self-Data Distillation for Recovering Quality in Pruned Large Language Models
Self-Data Distillation for Recovering Quality in Pruned Large Language Models
Vithursan Thangarasa
Ganesh Venkatesh
Mike Lasby
Nish Sinnadurai
Sean Lie
SyDa
478
4
0
13 Oct 2024
Distributed Inference on Mobile Edge and Cloud: An Early Exit based
  Clustering Approach
Distributed Inference on Mobile Edge and Cloud: An Early Exit based Clustering Approach
Divya J. Bajpai
M. Hanawal
FedML
210
2
0
06 Oct 2024
Hyper-multi-step: The Truth Behind Difficult Long-context Tasks
Hyper-multi-step: The Truth Behind Difficult Long-context Tasks
Yijiong Yu
Ma Xiufa
Fang Jianwei
Zhi-liang Xu
Su Guangyao
...
Zhixiao Qi
Wei Wang
Wen Liu
Ran Chen
Ji Pei
LRMRALM
328
1
0
06 Oct 2024
HarmAug: Effective Data Augmentation for Knowledge Distillation of Safety Guard Models
HarmAug: Effective Data Augmentation for Knowledge Distillation of Safety Guard ModelsInternational Conference on Learning Representations (ICLR), 2024
Seanie Lee
Haebin Seong
Dong Bok Lee
Minki Kang
Xiaoyin Chen
Dominik Wagner
Yoshua Bengio
Juho Lee
Sung Ju Hwang
378
13
0
02 Oct 2024
FedPT: Federated Proxy-Tuning of Large Language Models on
  Resource-Constrained Edge Devices
FedPT: Federated Proxy-Tuning of Large Language Models on Resource-Constrained Edge Devices
Zhidong Gao
Yu Zhang
Zhenxiao Zhang
Yanmin Gong
Yuanxiong Guo
141
3
0
01 Oct 2024
Previous
123456...202122
Next