LLM in a flash: Efficient Large Language Model Inference with Limited
MemoryAnnual Meeting of the Association for Computational Linguistics (ACL), 2023 |
Apparate: Rethinking Early Exits to Tame Latency-Throughput Tensions in
ML ServingSymposium on Operating Systems Principles (SOSP), 2023 |
EE-LLM: Large-Scale Training and Inference of Early-Exit Large Language
Models with 3D ParallelismInternational Conference on Machine Learning (ICML), 2023 |
SPIN: Sparsifying and Integrating Internal Neurons in Large Language
Models for Text ClassificationAnnual Meeting of the Association for Computational Linguistics (ACL), 2023 |
Contrastive Representation DistillationInternational Conference on Learning Representations (ICLR), 2019 |