ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2410.06270
  4. Cited By
Mixture Compressor for Mixture-of-Experts LLMs Gains More
v1v2 (latest)

Mixture Compressor for Mixture-of-Experts LLMs Gains More

International Conference on Learning Representations (ICLR), 2024
8 October 2024
Wei Huang
Yue Liao
Jianhui Liu
Ruifei He
Haoru Tan
Shiming Zhang
Hongsheng Li
Si Liu
Xiaojuan Qi
    MoE
ArXiv (abs)PDFHTMLHuggingFace (1 upvotes)

Papers citing "Mixture Compressor for Mixture-of-Experts LLMs Gains More"

13 / 63 papers shown
On the Representation Collapse of Sparse Mixture of Experts
On the Representation Collapse of Sparse Mixture of ExpertsNeural Information Processing Systems (NeurIPS), 2022
Zewen Chi
Li Dong
Shaohan Huang
Damai Dai
Shuming Ma
...
Payal Bajaj
Xia Song
Xian-Ling Mao
Heyan Huang
Furu Wei
MoMeMoE
310
136
0
20 Apr 2022
A Fast Post-Training Pruning Framework for Transformers
A Fast Post-Training Pruning Framework for TransformersNeural Information Processing Systems (NeurIPS), 2022
Woosuk Kwon
Sehoon Kim
Michael W. Mahoney
Joseph Hassoun
Kurt Keutzer
A. Gholami
242
203
0
29 Mar 2022
Training Verifiers to Solve Math Word Problems
Training Verifiers to Solve Math Word Problems
K. Cobbe
V. Kosaraju
Mohammad Bavarian
Mark Chen
Heewoo Jun
...
Jerry Tworek
Jacob Hilton
Reiichiro Nakano
Christopher Hesse
John Schulman
ReLMOffRLLRM
1.1K
6,875
0
27 Oct 2021
Scalable and Efficient MoE Training for Multitask Multilingual Models
Scalable and Efficient MoE Training for Multitask Multilingual Models
Young Jin Kim
A. A. Awan
Alexandre Muzio
Andres Felipe Cruz Salinas
Liyang Lu
Amr Hendy
Samyam Rajbhandari
Yuxiong He
Hany Awadalla
MoE
272
98
0
22 Sep 2021
Evaluating Large Language Models Trained on Code
Evaluating Large Language Models Trained on Code
Mark Chen
Jerry Tworek
Heewoo Jun
Qiming Yuan
Henrique Pondé
...
Bob McGrew
Dario Amodei
Sam McCandlish
Ilya Sutskever
Wojciech Zaremba
ELMALM
2.1K
7,824
0
07 Jul 2021
Measuring Mathematical Problem Solving With the MATH Dataset
Measuring Mathematical Problem Solving With the MATH Dataset
Dan Hendrycks
Collin Burns
Saurav Kadavath
Akul Arora
Steven Basart
Eric Tang
Basel Alomair
Jacob Steinhardt
ReLMFaML
909
3,967
0
05 Mar 2021
Accelerated Sparse Neural Training: A Provable and Efficient Method to
  Find N:M Transposable Masks
Accelerated Sparse Neural Training: A Provable and Efficient Method to Find N:M Transposable MasksNeural Information Processing Systems (NeurIPS), 2021
Itay Hubara
Brian Chmiel
Moshe Island
Ron Banner
S. Naor
Daniel Soudry
282
134
0
16 Feb 2021
Language Models are Few-Shot Learners
Language Models are Few-Shot LearnersNeural Information Processing Systems (NeurIPS), 2020
Tom B. Brown
Benjamin Mann
Nick Ryder
Melanie Subbiah
Jared Kaplan
...
Christopher Berner
Sam McCandlish
Alec Radford
Ilya Sutskever
Dario Amodei
BDL
2.0K
52,836
0
28 May 2020
HAWQ-V2: Hessian Aware trace-Weighted Quantization of Neural Networks
HAWQ-V2: Hessian Aware trace-Weighted Quantization of Neural NetworksNeural Information Processing Systems (NeurIPS), 2019
Zhen Dong
Z. Yao
Yaohui Cai
Daiyaan Arfeen
A. Gholami
Michael W. Mahoney
Kurt Keutzer
MQ
254
335
0
10 Nov 2019
Exploring the Limits of Transfer Learning with a Unified Text-to-Text
  Transformer
Exploring the Limits of Transfer Learning with a Unified Text-to-Text TransformerJournal of machine learning research (JMLR), 2019
Colin Raffel
Noam M. Shazeer
Adam Roberts
Katherine Lee
Sharan Narang
Michael Matena
Yanqi Zhou
Wei Li
Peter J. Liu
AIMat
1.5K
23,849
0
23 Oct 2019
Channel Pruning for Accelerating Very Deep Neural Networks
Channel Pruning for Accelerating Very Deep Neural Networks
Yihui He
Xiangyu Zhang
Jian Sun
638
2,686
0
19 Jul 2017
Outrageously Large Neural Networks: The Sparsely-Gated
  Mixture-of-Experts Layer
Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts LayerInternational Conference on Learning Representations (ICLR), 2017
Noam M. Shazeer
Azalia Mirhoseini
Krzysztof Maziarz
Andy Davis
Quoc V. Le
Geoffrey E. Hinton
J. Dean
MoE
608
3,755
0
23 Jan 2017
XNOR-Net: ImageNet Classification Using Binary Convolutional Neural
  Networks
XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks
Mohammad Rastegari
Vicente Ordonez
Joseph Redmon
Ali Farhadi
MQ
657
4,595
0
16 Mar 2016
Previous
12