KV-Runahead: Scalable Causal LLM Inference by Parallel Key-Value Cache
GenerationInternational Conference on Machine Learning (ICML), 2024 |
Tenplex: Dynamic Parallelism for Deep Learning using Parallelizable
Tensor CollectionsSymposium on Operating Systems Principles (SOSP), 2023 |
Scaling Laws of RoPE-based ExtrapolationInternational Conference on Learning Representations (ICLR), 2023 |
Ring Attention with Blockwise Transformers for Near-Infinite ContextInternational Conference on Learning Representations (ICLR), 2023 |
Reducing Activation Recomputation in Large Transformer ModelsConference on Machine Learning and Systems (MLSys), 2022 |
Colossal-AI: A Unified Deep Learning System For Large-Scale Parallel
TrainingInternational Conference on Parallel Processing (ICPP), 2021 |