Towards Efficient Mixture of Experts: A Holistic Study of Compression Techniques

4 June 2024

Papers citing "Towards Efficient Mixture of Experts: A Holistic Study of Compression Techniques"

7 / 7 papers shown

Title
PowerInfer: Fast Large Language Model Serving with a Consumer-grade GPU Yixin Song Zeyu Mi Haotong Xie Haibo Chen BDL 109 114 0 16 Dec 2023
Mixture-of-Experts with Expert Choice Routing Yan-Quan Zhou Tao Lei Han-Chu Liu Nan Du Yanping Huang Vincent Zhao Andrew M. Dai Zhifeng Chen Quoc V. Le James Laudon MoE 137 323 0 18 Feb 2022
Beyond Distillation: Task-level Mixture-of-Experts for Efficient Inference Sneha Kudugunta Yanping Huang Ankur Bapna M. Krikun Dmitry Lepikhin Minh-Thang Luong Orhan Firat MoE 119 87 0 24 Sep 2021
Zero-Shot Text-to-Image Generation Aditya A. Ramesh Mikhail Pavlov Gabriel Goh Scott Gray Chelsea Voss Alec Radford Mark Chen Ilya Sutskever VLM 253 4,735 0 24 Feb 2021
Pruning and Quantization for Deep Neural Network Acceleration: A Survey Tailin Liang C. Glossner Lei Wang Shaobo Shi Xiaotong Zhang MQ 118 665 0 24 Jan 2021
The Pile: An 800GB Dataset of Diverse Text for Language Modeling Leo Gao Stella Biderman Sid Black Laurence Golding Travis Hoppe ... Horace He Anish Thite Noa Nabeshima Shawn Presser Connor Leahy AIMat 236 1,508 0 31 Dec 2020
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding Alex Jinpeng Wang Amanpreet Singh Julian Michael Felix Hill Omer Levy Samuel R. Bowman ELM 294 6,003 0 20 Apr 2018