Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales

Terms and Conditions

Twitter GitHub LinkedIn Bluesky Youtube

© 2026 ResearchTrend.AI, All rights reserved.

Home
Papers
2201.00558
Cited By

Which Student is Best? A Comprehensive Knowledge Distillation Exam for
Task-Specific BERT Models

Which Student is Best? A Comprehensive Knowledge Distillation Exam for Task-Specific BERT Models

3 January 2022

Made Nindyatama Nityasya

Haryo Akbarianto Wibowo

Radityo Eko Prasojo

Alham Fikri Aji

ArXiv (abs)PDF HTML Github (20693★)

Papers citing "Which Student is Best? A Comprehensive Knowledge Distillation Exam for Task-Specific BERT Models"

6 / 6 papers shown

On Multilingual Encoder Language Model Compression for Low-Resource Languages

On Multilingual Encoder Language Model Compression for Low-Resource Languages

Daniil Gurgurov

Josef van Genabith

Simon Ostermann

512

0

0

22 May 2025

Small Language Models in the Real World: Insights from Industrial Text Classification

Small Language Models in the Real World: Insights from Industrial Text Classification

Niccolo Gentile

Geoffrey Nichil

567

2

0

21 May 2025

The Privileged Students: On the Value of Initialization in Multilingual
Knowledge Distillation

The Privileged Students: On the Value of Initialization in Multilingual Knowledge Distillation

Haryo Akbarianto Wibowo

Alham Fikri Aji

262

4

0

24 Jun 2024

Simple Hack for Transformers against Heavy Long-Text Classification on a
Time- and Memory-Limited GPU Service

Simple Hack for Transformers against Heavy Long-Text Classification on a Time- and Memory-Limited GPU Service

Mirza Alim Mutasodirin

Radityo Eko Prasojo

184

0

0

19 Mar 2024

Improving Neural Topic Models with Wasserstein Knowledge Distillation

Improving Neural Topic Models with Wasserstein Knowledge DistillationEuropean Conference on Information Retrieval (ECIR), 2023

Debarshi Kumar Sanyal

267

3

0

27 Mar 2023

Xception: Deep Learning with Depthwise Separable Convolutions

Xception: Deep Learning with Depthwise Separable ConvolutionsComputer Vision and Pattern Recognition (CVPR), 2016

François Chollet

3.6K

17,433

0

07 Oct 2016

Page 1 of 1