To Distill or Not to Distill? On the Robustness of Robust Knowledge
DistillationAnnual Meeting of the Association for Computational Linguistics (ACL), 2024 |
Distributed Speculative Inference (DSI): Speculation Parallelism for Provably Faster Lossless Language Model InferenceInternational Conference on Learning Representations (ICLR), 2024 |
Towards Modular LLMs by Building and Reusing a Library of LoRAsInternational Conference on Machine Learning (ICML), 2024 |
A Survey on RAG Meeting LLMs: Towards Retrieval-Augmented Large Language
ModelsKnowledge Discovery and Data Mining (KDD), 2024 |
KV-Runahead: Scalable Causal LLM Inference by Parallel Key-Value Cache
GenerationInternational Conference on Machine Learning (ICML), 2024 |