SLoPe: Double-Pruned Sparse Plus Lazy Low-Rank Adapter Pretraining of LLMsInternational Conference on Learning Representations (ICLR), 2024 |
MaskLLM: Learnable Semi-Structured Sparsity for Large Language ModelsNeural Information Processing Systems (NeurIPS), 2024 |
S-STE: Continuous Pruning Function for Efficient 2:4 Sparse Pre-trainingNeural Information Processing Systems (NeurIPS), 2024 |
Accelerating Transformer Pre-training with 2:4 SparsityInternational Conference on Machine Learning (ICML), 2024 |
Scaling Laws for Sparsely-Connected Foundation ModelsInternational Conference on Learning Representations (ICLR), 2023 |