v1v2v3v4v5 (latest)

TinyBERT: Distilling BERT for Natural Language Understanding

Findings (Findings), 2019

23 September 2019

Xiaoqi Jiao

Yichun Yin

Lifeng Shang

Xin Jiang

Linlin Li

Qun Liu

Papers citing "TinyBERT: Distilling BERT for Natural Language Understanding"

50 / 1,055 papers shown

An Empirical Study of Knowledge Distillation for Code Understanding Tasks

124

21 Aug 2025

Checkmate: interpretable and explainable RSVQA is the endgame

136

18 Aug 2025

Computational Economics in Large Language Models: Exploring Model Behavior and Incentive Design under Resource Constraints

160

14 Aug 2025

Personalized Product Search Ranking: A Multi-Task Learning Approach with Tabular and Non-Tabular Data

120

13 Aug 2025

Bhav-Net: Knowledge Transfer for Cross-Lingual Antonym vs Synonym Distinction via Dual-Space Graph Transformers

Samyak S. Sanghvi

162

12 Aug 2025

GeRe: Towards Efficient Anti-Forgetting in Continual Learning of LLM via General Samples Replay

135

06 Aug 2025

CARD: A Cache-Assisted Parallel Speculative Decoding Framework via Query-and-Correct Paradigm for Accelerating LLM Inference

156

06 Aug 2025

DACTYL: Diverse Adversarial Corpus of Texts Yielded from Large Language Models

Shantanu Thorat

Andrew Caines

157

01 Aug 2025

Enhanced Arabic Text Retrieval with Attentive Relevance Scoring

Salah Eddine Bekhouche

138

31 Jul 2025

On the Sustainability of AI Inferences in the Edge

Ghazal Sobhani

Md. Monzurul Amin Ifath

Tushar Sharma

I. Haque

100

30 Jul 2025

Model-free Speculative Decoding for Transformer-based ASR with Token Map Drafting

29 Jul 2025

Investigating Structural Pruning and Recovery Techniques for Compressing Multimodal Large Language Models: An Empirical Study

176

28 Jul 2025

The Carbon Cost of Conversation, Sustainability in the Age of Language Models

Sayed Mahbub Hasan Amiri

Prasun Goswami

Md. Mainul Islam

Mohammad Shakhawat Hossen

Sayed Majhab Hasan Amiri

Naznin Akter

SILM SyDa

255

26 Jul 2025

Basic Reading DistillationAnnual Meeting of the Association for Computational Linguistics (ACL), 2025

157

26 Jul 2025

Collaborative Distillation Strategies for Parameter-Efficient Language Model Deployment

140

21 Jul 2025

Flexible Feature Distillation for Large Language Models

Khouloud Saadi

Di Wang

261

14 Jul 2025

Tractable Representation Learning with Probabilistic Circuits

368

06 Jul 2025

General Compression Framework for Efficient Transformer Object Tracking

...

309

01 Jul 2025

A Hybrid DeBERTa and Gated Broad Learning System for Cyberbullying Detection in English Text

Devesh Kumar

108

19 Jun 2025

Knowledge Distillation Framework for Accelerating High-Accuracy Neural Network-Based Molecular Dynamics Simulations

230

18 Jun 2025

AgentDistill: Training-Free Agent Distillation with Generalizable MCP Boxes

...

201

17 Jun 2025

Plug-in and Fine-tuning: Bridging the Gap between Small Language Models and Large Language ModelsAnnual Meeting of the Association for Computational Linguistics (ACL), 2025

224

09 Jun 2025

Analyzing Transformer Models and Knowledge Distillation Approaches for Image Captioning on Edge AI

146

04 Jun 2025

On Fairness of Task Arithmetic: The Role of Task Vectors

Hiroki Naganuma

Kotaro Yoshida

Laura Gomezjurado Gonzalez

192

30 May 2025

RCCDA: Adaptive Model Updates in the Presence of Concept Drift under a Constrained Resource Budget

Adam Piaseczny

Md Kamran Chowdhury Shisher

Shiqiang Wang

Christopher G. Brinton

181

30 May 2025

FLAT-LLM: Fine-grained Low-rank Activation Space Transformation for Large Language Model Compression

191

29 May 2025

Efficient Large Language Model Inference with Neural Block Linearization

Mete Erdogan

F. Tonin

Volkan Cevher

363

27 May 2025

SEMFED: Semantic-Aware Resource-Efficient Federated Learning for Heterogeneous NLP Tasks

149

26 May 2025

Avoid Forgetting by Preserving Global Knowledge Gradients in Federated Learning with Non-IID Data

Abhijit Chunduru

Majid Morafah

Mahdi Morafah

Vishnu Pandi Chellapandi

Ang Li

FedML

361

26 May 2025

Small Language Models: Architectures, Techniques, Evaluation, Problems and Future Adaptation

258

26 May 2025

FAR: Function-preserving Attention Replacement for IMC-friendly Inference

228

24 May 2025

On the creation of narrow AI: hierarchy and nonlocality of neural network skills

Eric J. Michaud

Asher Parker-Sartori

Max Tegmark

449

21 May 2025

Bridging Generative and Discriminative Learning: Few-Shot Relation Extraction via Two-Stage Knowledge-Guided Pre-trainingInternational Joint Conference on Artificial Intelligence (IJCAI), 2025

257

18 May 2025

ExpertSteer: Intervening in LLMs through Expert Knowledge

491

18 May 2025

On Membership Inference Attacks in Knowledge Distillation

Ziyao Cui

Minxing Zhang

Jian Pei

245

17 May 2025

Distilled Circuits: A Mechanistic Study of Internal Restructuring in Knowledge Distillation

Reilly Haskins

Benjamin Adams

289

16 May 2025

Tracr-Injection: Distilling Algorithms into Pre-trained Language ModelsAnnual Meeting of the Association for Computational Linguistics (ACL), 2025

Tomás Vergara-Browne

Álvaro Soto

544

15 May 2025

Private Transformer Inference in MLaaS: A Survey

245

15 May 2025

PWC-MoE: Privacy-Aware Wireless Collaborative Mixture of Experts

122

13 May 2025

KDH-MLTC: Knowledge Distillation for Healthcare Multi-Label Text Classification

Hajar Sakai

Sarah Lam

VLM

346

12 May 2025

Towards Artificial General or Personalized Intelligence? A Survey on Foundation Models for Personalized Federated Intelligence

399

11 May 2025

Gender Bias in Explainability: Investigating Performance Disparity in Post-hoc MethodsConference on Fairness, Accountability and Transparency (FAccT), 2025

313

02 May 2025

LLM-Based Threat Detection and Prevention Framework for IoT Ecosystems

Yazan Otoum

Arghavan Asad

Amiya Nayak

472

01 May 2025

KETCHUP: K-Step Return Estimation for Sequential Knowledge Distillation

415

26 Apr 2025

HMI: Hierarchical Knowledge Management for Efficient Multi-Tenant Inference in Pretrained Language ModelsThe VLDB journal (VLDB J.), 2025

188

24 Apr 2025

A Survey of Foundation Model-Powered Recommender Systems: From Feature-Based, Generative to Agentic Paradigms

294

23 Apr 2025

Honey, I Shrunk the Language Model: Impact of Knowledge Distillation Methods on Performance and Explainability

357

22 Apr 2025

DistilQwen2.5: Industrial Practices of Training Distilled Open Lightweight Language Models

215

21 Apr 2025

Knowledge Distillation and Dataset Distillation of Large Language Models: Emerging Trends, Challenges, and Future Directions

...

Tianming Liu

Ping Ma

ALM

256

20 Apr 2025

Empirical Evaluation of Knowledge Distillation from Transformers to Subquadratic Language Models

Patrick Haller

Jonas Golde

Alan Akbik

377

19 Apr 2025