Sparse Upcycling: Inference Inefficient Finetuning

13 November 2024

Papers citing "Sparse Upcycling: Inference Inefficient Finetuning"

2 / 2 papers shown

Title
Scaling Fine-Grained MoE Beyond 50B Parameters: Empirical Evaluation and Practical Insights Jakub Krajewski Marcin Chochowski Daniel Korzekwa MoE ALM 178 0 0 03 Jun 2025
Beyond Chinchilla-Optimal: Accounting for Inference in Language Model Scaling LawsInternational Conference on Machine Learning (ICML), 2023 Nikhil Sardana Jacob P. Portes Sasha Doubov Jonathan Frankle LRM 867 120 0 31 Dec 2023