A Study on Knowledge Distillation from Weak Teacher for Scaling Up Pre-trained Language Models

Annual Meeting of the Association for Computational Linguistics (ACL), 2023

26 May 2023

ArXiv (abs)PDF HTML Github (159253★)

Papers citing "A Study on Knowledge Distillation from Weak Teacher for Scaling Up Pre-trained Language Models"

6 / 6 papers shown

Dynamic Self-Distillation via Previous Mini-batches for Fine-tuning Small Language Models

426

25 Nov 2024

Weak-to-Strong Generalization beyond Accuracy: a Pilot Study in Safety, Toxicity, and Legal Reasoning

369

16 Oct 2024

Determine-Then-Ensemble: Necessity of Top-k Union for Large Language Model EnsemblingInternational Conference on Learning Representations (ICLR), 2024

280

03 Oct 2024

Survey on Knowledge Distillation for Large Language Models: Methods, Evaluation, and Application

Chuanpeng Yang

Wang Lu

Yao Zhu

Yidong Wang

Yiqiang Chen

307

02 Jul 2024

Step Out and Seek Around: On Warm-Start Training with Incremental Data

Jose M. Alvarez

337

06 Jun 2024

Co-training and Co-distillation for Quality Improvement and Compression of Language ModelsConference on Empirical Methods in Natural Language Processing (EMNLP), 2023