Diversifying the Expert Knowledge for Task-Agnostic Pruning in Sparse Mixture-of-Experts

12 July 2024

Papers citing "Diversifying the Expert Knowledge for Task-Agnostic Pruning in Sparse Mixture-of-Experts"

2 / 2 papers shown

Title
Towards Efficient Mixture of Experts: A Holistic Study of Compression Techniques Shwai He Daize Dong Liang Ding Ang Li MoE 58 3 0 04 Jun 2024
Scaling Laws for Neural Language Models Jared Kaplan Sam McCandlish T. Henighan Tom B. Brown B. Chess R. Child Scott Gray Alec Radford Jeff Wu Dario Amodei 220 3,054 0 23 Jan 2020