v1v2v3v4v5 (latest)

TinyBERT: Distilling BERT for Natural Language Understanding

Findings (Findings), 2019

23 September 2019

Xiaoqi Jiao

Yichun Yin

Lifeng Shang

Xin Jiang

Linlin Li

Qun Liu

Papers citing "TinyBERT: Distilling BERT for Natural Language Understanding"

50 / 1,055 papers shown

A Dual-Space Framework for General Knowledge Distillation of Large Language Models

367

15 Apr 2025

Multi-Sense Embeddings for Language Models and Knowledge DistillationAnnual Meeting of the Association for Computational Linguistics (ACL), 2025

249

08 Apr 2025

Saliency-driven Dynamic Token Pruning for Large Language Models

449

06 Apr 2025

Multimodal Fusion and Vision-Language Models: A Survey for Robot VisionInformation Fusion (Inf. Fusion), 2025

...

427

03 Apr 2025

Evidencing Unauthorized Training Data from AI Generated Content using Information Isotopes

...

289

24 Mar 2025

Efficient Knowledge Distillation via Curriculum Extraction

Shivam Gupta

Sushrut Karmalkar

336

21 Mar 2025

Sparse Logit Sampling: Accelerating Knowledge Distillation in LLMsAnnual Meeting of the Association for Computational Linguistics (ACL), 2025

817

21 Mar 2025

A Generalist Hanabi AgentInternational Conference on Learning Representations (ICLR), 2025

Arjun Vaithilingam Sudhakar

Hadi Nekoei

Mathieu Reymond

Miao Liu

Janarthanan Rajendran

Sarath Chandar

929

17 Mar 2025

IteRABRe: Iterative Recovery-Aided Block Reduction

Haryo Akbarianto Wibowo

274

08 Mar 2025

SplatPose: Geometry-Aware 6-DoF Pose Estimation from Single RGB Image via 3D Gaussian Splatting

315

07 Mar 2025

Malware Detection at the Edge with Lightweight LLMs: A Performance EvaluationACM Transactions on Internet Technology (TOIT), 2025

244

06 Mar 2025

EPEE: Towards Efficient and Effective Foundation Models in Biomedicine

245

03 Mar 2025

FedMentalCare: Towards Privacy-Preserving Fine-Tuned LLMs to Analyze Mental Health Status Using Federated Learning Framework

S M Sarwar

AI4MH

230

27 Feb 2025

XCOMPS: A Multilingual Benchmark of Conceptual Minimal Pairs

Adrian Nicolas Florea

...

298

27 Feb 2025

"Actionable Help" in Crises: A Novel Dataset and Resource-Efficient Models for Identifying Request and Offer Social Media Posts

211

24 Feb 2025

Every Expert Matters: Towards Effective Knowledge Distillation for Mixture-of-Experts Language Models

281

18 Feb 2025

PASER: Post-Training Data Selection for Efficient Pruned Large Language Model Recovery

395

18 Feb 2025

Primus: A Pioneering Collection of Open-Source Datasets for Cybersecurity LLM Training

362

16 Feb 2025

Performance Analysis of Traditional VQA Models Under Limited Computational Resources

Jihao Gu

278

09 Feb 2025

A Framework for Double-Blind Federated Adaptation of Foundation Models

Nurbek Tastan

Karthik Nandakumar

FedML

311

03 Feb 2025

Fake News Detection After LLM Laundering: Measurement and Explanation

Rupak Kumar Das

Jonathan Dodge

556

29 Jan 2025

Towards Cross-Tokenizer Distillation: the Universal Logit Distillation Loss for LLMs

442

28 Jan 2025

Merino: Entropy-driven Design for Generative Language Models on IoT DevicesAAAI Conference on Artificial Intelligence (AAAI), 2024

373

28 Jan 2025

Extracting General-use Transformers for Low-resource Languages via Knowledge Distillation

Jan Christian Blaise Cruz

Alham Fikri Aji

314

22 Jan 2025

Quantification of Large Language Model DistillationAnnual Meeting of the Association for Computational Linguistics (ACL), 2025

...

Hamid Alinejad-Rokny

296

22 Jan 2025

GREEN-CODE: Learning to Optimize Energy Efficiency in LLM-based Code GenerationIEEE/ACM International Symposium on Cluster, Cloud and Internet Computing (CCGrid), 2025

Shashikant Ilager

Lukas Florian Briem

Ivona Brandić

230

19 Jan 2025

Quantization Meets Reasoning: Exploring LLM Low-Bit Quantization Degradation for Mathematical Reasoning

690

06 Jan 2025

Lillama: Large Language Models Compression via Low-Rank Feature Distillation

Yaya Sy

Christophe Cerisara

Irina Illina

302

31 Dec 2024

MatchMiner-AI: An Open-Source Solution for Cancer Clinical Trial Matching

...

Jad El Masri

Alys Malcolm

Tali Mazor

Ethan Cerami

Kenneth L. Kehl

252

23 Dec 2024

Knowledge Distillation in RNN-Attention Models for Early Prediction of Student PerformanceACM Symposium on Applied Computing (SAC), 2024

253

19 Dec 2024

Deploying Foundation Model Powered Agent Services: A Survey

...

475

18 Dec 2024

Lightweight Contenders: Navigating Semi-Supervised Text Mining through Peer Collaboration and Self TranscendenceNorth American Chapter of the Association for Computational Linguistics (NAACL), 2024

293

01 Dec 2024

Can bidirectional encoder become the ultimate winner for downstream applications of foundation models?

278

27 Nov 2024

Dynamic Self-Distillation via Previous Mini-batches for Fine-tuning Small Language Models

389

25 Nov 2024

Understanding Generalization of Federated Learning: the Trade-off between Model Stability and Optimization

513

25 Nov 2024

Is Training Data Quality or Quantity More Impactful to Small Language Model Performance?

Aryan Sajith

Krishna Chaitanya Rao Kathala

250

24 Nov 2024

Quantifying Knowledge Distillation Using Partial Information DecompositionInternational Conference on Artificial Intelligence and Statistics (AISTATS), 2024

303

12 Nov 2024

Over-parameterized Student Model via Tensor Decomposition Boosted Knowledge DistillationNeural Information Processing Systems (NeurIPS), 2024

250

10 Nov 2024

Decoupling Dark Knowledge via Block-wise Logit Distillation for Feature-level AlignmentIEEE Transactions on Artificial Intelligence (IEEE TAI), 2024

345

03 Nov 2024

Efficient Deep Learning Infrastructures for Embedded Computing Systems: A Comprehensive Survey and Future EnvisionACM Transactions on Embedded Computing Systems (TECS), 2024

221

03 Nov 2024

Larger models yield better results? Streamlined severity classification of ADHD-related concerns using BERT-based knowledge distillationmedRxiv (medRxiv), 2024

Ahmed Akib Jawad Karim

Kazi Hafiz Md. Asad

Md. Golam Rabiul Alam

AI4MH

261

30 Oct 2024

KD-LoRA: A Hybrid Approach to Efficient Fine-Tuning with LoRA and Knowledge Distillation

Rambod Azimi

Rishav Rishav

M. Teichmann

Samira Ebrahimi Kahou

ALM

308

28 Oct 2024

A Little Help Goes a Long Way: Efficient LLM Training by Leveraging Small LMs

A. S. Rawat

Veeranjaneyulu Sadhanala

...

Sanjiv Kumar

465

24 Oct 2024

Pre-training Distillation for Large Language Models: A Design Space ExplorationAnnual Meeting of the Association for Computational Linguistics (ACL), 2024

Juanzi Li

271

21 Oct 2024

SHAKTI: A 2.5 Billion Parameter Small Language Model Optimized for Edge AI and Low-Resource EnvironmentsArtificial Intelligence Applications and Innovations (AIAI), 2024

Syed Abdul Gaffar Shakhadri

Kruthika KR

Rakshit Aralimatti

VLM

184

15 Oct 2024

Self-Data Distillation for Recovering Quality in Pruned Large Language Models

486

13 Oct 2024

Distributed Inference on Mobile Edge and Cloud: An Early Exit based Clustering Approach

Divya J. Bajpai

M. Hanawal

FedML

218

06 Oct 2024

Hyper-multi-step: The Truth Behind Difficult Long-context Tasks

Yijiong Yu

...

340

06 Oct 2024

HarmAug: Effective Data Augmentation for Knowledge Distillation of Safety Guard ModelsInternational Conference on Learning Representations (ICLR), 2024

Seanie Lee

Juho Lee

Sung Ju Hwang

378

02 Oct 2024

FedPT: Federated Proxy-Tuning of Large Language Models on Resource-Constrained Edge Devices

Zhidong Gao

Yu Zhang

Zhenxiao Zhang

Yanmin Gong

Yuanxiong Guo

153

01 Oct 2024

All Papers

TinyBERT: Distilling BERT for Natural Language Understanding

Papers citing "TinyBERT: Distilling BERT for Natural Language Understanding"