ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2410.06270
  4. Cited By
Mixture Compressor for Mixture-of-Experts LLMs Gains More
v1v2 (latest)

Mixture Compressor for Mixture-of-Experts LLMs Gains More

International Conference on Learning Representations (ICLR), 2024
8 October 2024
Wei Huang
Yue Liao
Jianhui Liu
Ruifei He
Haoru Tan
Shiming Zhang
Hongsheng Li
Si Liu
Xiaojuan Qi
    MoE
ArXiv (abs)PDFHTMLHuggingFace (1 upvotes)

Papers citing "Mixture Compressor for Mixture-of-Experts LLMs Gains More"

50 / 63 papers shown
MoDES: Accelerating Mixture-of-Experts Multimodal Large Language Models via Dynamic Expert Skipping
MoDES: Accelerating Mixture-of-Experts Multimodal Large Language Models via Dynamic Expert Skipping
Yushi Huang
Z. Wang
Zhihang Yuan
Yifu Ding
Ruihao Gong
Jinyang Guo
Xianglong Liu
Jun Zhang
MoEVLM
261
1
0
19 Nov 2025
Opportunistic Expert Activation: Batch-Aware Expert Routing for Faster Decode Without Retraining
Opportunistic Expert Activation: Batch-Aware Expert Routing for Faster Decode Without Retraining
Costin-Andrei Oncescu
Qingyang Wu
Wai Tong Chung
Robert Wu
Bryan Gopal
Junxiong Wang
Tri Dao
Ben Athiwaratkun
MoE
204
0
0
04 Nov 2025
REAP the Experts: Why Pruning Prevails for One-Shot MoE compression
REAP the Experts: Why Pruning Prevails for One-Shot MoE compression
Mike Lasby
Ivan Lazarevich
Nish Sinnadurai
Sean Lie
Yani Andrew Ioannou
Vithursan Thangarasa
120
1
0
15 Oct 2025
QeRL: Beyond Efficiency -- Quantization-enhanced Reinforcement Learning for LLMs
QeRL: Beyond Efficiency -- Quantization-enhanced Reinforcement Learning for LLMs
Wei Huang
Y. Ge
S. Yang
Yicheng Xiao
Huizi Mao
...
Hongxu Yin
Yao Lu
Xiaojuan Qi
Song Han
Yukang Chen
OffRL
114
0
0
13 Oct 2025
MC#: Mixture Compressor for Mixture-of-Experts Large Models
MC#: Mixture Compressor for Mixture-of-Experts Large Models
Wei Huang
Yue Liao
Yukang Chen
Jianhui Liu
Haoru Tan
Si Liu
Shiming Zhang
Shuicheng Yan
Xiaojuan Qi
MoEMQ
205
0
0
13 Oct 2025
HEAPr: Hessian-based Efficient Atomic Expert Pruning in Output Space
HEAPr: Hessian-based Efficient Atomic Expert Pruning in Output Space
Ke Li
Zheng Yang
Zhongbin Zhou
Feng Xue
Zhonglin Jiang
Wenxiao Wang
MoE
117
0
0
26 Sep 2025
SAIL-VL2 Technical Report
SAIL-VL2 Technical Report
Weijie Yin
Yongjie Ye
Fangxun Shu
Yue Liao
Zijian Kang
...
Han Wang
Wenzhuo Liu
Xiao Liang
Shuicheng Yan
Chao Feng
LRMVLM
296
4
0
17 Sep 2025
Dropping Experts, Recombining Neurons: Retraining-Free Pruning for Sparse Mixture-of-Experts LLMs
Dropping Experts, Recombining Neurons: Retraining-Free Pruning for Sparse Mixture-of-Experts LLMs
Yixiao Zhou
Ziyu Zhao
Dongzhou Cheng
Zhiliang Wu
Jie Gui
Yi-feng Yang
Fei Wu
Yu Cheng
Hehe Fan
MoMeMoE
152
3
0
12 Sep 2025
Ban&Pick: Ehancing Performance and Efficiency of MoE-LLMs via Smarter Routing
Ban&Pick: Ehancing Performance and Efficiency of MoE-LLMs via Smarter Routing
Yuanteng Chen
Peisong Wang
Yuantian Shao
Nanxin Zeng
Chang Xu
Jian Cheng
MoE
171
0
0
08 Sep 2025
MoPEQ: Mixture of Mixed Precision Quantized Experts
MoPEQ: Mixture of Mixed Precision Quantized Experts
Krishna Teja Chitty-Venkata
Jie Ye
M. Emani
MoEMQ
85
2
0
02 Sep 2025
GPT-OSS-20B: A Comprehensive Deployment-Centric Analysis of OpenAI's Open-Weight Mixture of Experts Model
GPT-OSS-20B: A Comprehensive Deployment-Centric Analysis of OpenAI's Open-Weight Mixture of Experts Model
Deepak Kumar
Divakar Yadav
Yash Patel
MoE
192
3
0
22 Aug 2025
EAC-MoE: Expert-Selection Aware Compressor for Mixture-of-Experts Large Language Models
EAC-MoE: Expert-Selection Aware Compressor for Mixture-of-Experts Large Language ModelsAnnual Meeting of the Association for Computational Linguistics (ACL), 2025
Yuanteng Chen
Yuantian Shao
Peisong Wang
Jian Cheng
MoE
161
2
0
03 Aug 2025
Not All Models Suit Expert Offloading: On Local Routing Consistency of Mixture-of-Expert Models
Not All Models Suit Expert Offloading: On Local Routing Consistency of Mixture-of-Expert Models
Jingcong Liang
Siyuan Wang
Miren Tian
Yitong Li
Duyu Tang
Zhongyu Wei
MoE
320
0
0
21 May 2025
MxMoE: Mixed-precision Quantization for MoE with Accuracy and Performance Co-Design
MxMoE: Mixed-precision Quantization for MoE with Accuracy and Performance Co-Design
Haojie Duanmu
Xiuhong Li
Zhihang Yuan
Size Zheng
Jiangfei Duan
Xingcheng Zhang
Dahua Lin
MQMoE
942
8
0
09 May 2025
D$^{2}$MoE: Dual Routing and Dynamic Scheduling for Efficient On-Device MoE-based LLM Serving
D2^{2}2MoE: Dual Routing and Dynamic Scheduling for Efficient On-Device MoE-based LLM Serving
Haodong Wang
Qihua Zhou
Zicong Hong
Song Guo
MoE
215
3
0
17 Apr 2025
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
DeepSeek-AI
Daya Guo
Dejian Yang
Haowei Zhang
Junxiao Song
...
Shiyu Wang
S. Yu
Shunfeng Zhou
Shuting Pan
S.S. Li
OffRLAI4TSLRMReLMVLM
1.2K
5,342
0
22 Jan 2025
EfficientQAT: Efficient Quantization-Aware Training for Large Language Models
EfficientQAT: Efficient Quantization-Aware Training for Large Language Models
Mengzhao Chen
Wenqi Shao
Peng Xu
Jiahao Wang
Shiyang Feng
Kaipeng Zhang
Ping Luo
MQ
535
77
0
10 Jul 2024
Efficient Expert Pruning for Sparse Mixture-of-Experts Language Models:
  Enhancing Performance and Reducing Inference Costs
Efficient Expert Pruning for Sparse Mixture-of-Experts Language Models: Enhancing Performance and Reducing Inference Costs
Enshu Liu
Junyi Zhu
Zinan Lin
Xuefei Ning
Matthew B. Blaschko
Shengen Yan
Guohao Dai
Huazhong Yang
Yu Wang
MoE
220
21
0
01 Jul 2024
Attention Score is not All You Need for Token Importance Indicator in KV
  Cache Reduction: Value Also Matters
Attention Score is not All You Need for Token Importance Indicator in KV Cache Reduction: Value Also Matters
Zhiyu Guo
Hidetaka Kamigaito
Taro Watanabe
240
42
0
18 Jun 2024
SliM-LLM: Salience-Driven Mixed-Precision Quantization for Large Language Models
SliM-LLM: Salience-Driven Mixed-Precision Quantization for Large Language Models
Wei Huang
Haotong Qin
Yangdong Liu
Yawei Li
Qinshuo Liu
Xianglong Liu
Luca Benini
Michele Magno
Shiming Zhang
Xiaojuan Qi
MQ
359
34
0
23 May 2024
A Survey on Efficient Inference for Large Language Models
A Survey on Efficient Inference for Large Language Models
Zixuan Zhou
Xuefei Ning
Ke Hong
Tianyu Fu
Jiaming Xu
...
Shengen Yan
Guohao Dai
Xiao-Ping Zhang
Yuhan Dong
Yu Wang
419
171
0
22 Apr 2024
Mitigating the Impact of Outlier Channels for Language Model
  Quantization with Activation Regularization
Mitigating the Impact of Outlier Channels for Language Model Quantization with Activation Regularization
Aniruddha Nrusimha
Mayank Mishra
Naigang Wang
Dan Alistarh
Yikang Shen
Yoon Kim
MQ
221
19
0
04 Apr 2024
Toward Inference-optimal Mixture-of-Expert Large Language Models
Toward Inference-optimal Mixture-of-Expert Large Language Models
Longfei Yun
Yonghao Zhuang
Yao Fu
Eric P. Xing
Hao Zhang
MoE
300
13
0
03 Apr 2024
A Comprehensive Evaluation of Quantization Strategies for Large Language
  Models
A Comprehensive Evaluation of Quantization Strategies for Large Language Models
Renren Jin
Jiangcun Du
Wuwei Huang
Wei Liu
Jian Luan
Sijin Yu
Deyi Xiong
MQ
287
67
0
26 Feb 2024
Not All Experts are Equal: Efficient Expert Pruning and Skipping for
  Mixture-of-Experts Large Language Models
Not All Experts are Equal: Efficient Expert Pruning and Skipping for Mixture-of-Experts Large Language Models
Xudong Lu
Zijun Chen
Yuhui Xu
Aojun Zhou
Siyuan Huang
Bo Zhang
Junchi Yan
Jiaming Song
MoE
295
70
0
22 Feb 2024
ApiQ: Finetuning of 2-Bit Quantized Large Language Model
ApiQ: Finetuning of 2-Bit Quantized Large Language Model
Baohao Liao
Christian Herold
Shahram Khadivi
Christof Monz
CLLMQ
380
23
0
07 Feb 2024
QuIP#: Even Better LLM Quantization with Hadamard Incoherence and
  Lattice Codebooks
QuIP#: Even Better LLM Quantization with Hadamard Incoherence and Lattice Codebooks
Albert Tseng
Jerry Chee
Qingyao Sun
Volodymyr Kuleshov
Christopher De Sa
MQ
465
215
0
06 Feb 2024
BiLLM: Pushing the Limit of Post-Training Quantization for LLMs
BiLLM: Pushing the Limit of Post-Training Quantization for LLMsInternational Conference on Machine Learning (ICML), 2024
Wei Huang
Yangdong Liu
Haotong Qin
Ying Li
Shiming Zhang
Xianglong Liu
Michele Magno
Xiaojuan Qi
MQ
317
128
0
06 Feb 2024
Extreme Compression of Large Language Models via Additive Quantization
Extreme Compression of Large Language Models via Additive QuantizationInternational Conference on Machine Learning (ICML), 2024
Vage Egiazarian
Andrei Panferov
Denis Kuznedelev
Elias Frantar
Artem Babenko
Dan Alistarh
MQ
418
149
0
11 Jan 2024
DeepSeekMoE: Towards Ultimate Expert Specialization in
  Mixture-of-Experts Language Models
DeepSeekMoE: Towards Ultimate Expert Specialization in Mixture-of-Experts Language ModelsAnnual Meeting of the Association for Computational Linguistics (ACL), 2024
Damai Dai
Chengqi Deng
Chenggang Zhao
R. X. Xu
Huazuo Gao
...
Panpan Huang
Fuli Luo
Chong Ruan
Zhifang Sui
W. Liang
MoE
375
601
0
11 Jan 2024
Mixtral of Experts
Mixtral of Experts
Albert Q. Jiang
Alexandre Sablayrolles
Antoine Roux
A. Mensch
Blanche Savary
...
Théophile Gervet
Thibaut Lavril
Thomas Wang
Timothée Lacroix
William El Sayed
MoELLMAG
519
1,569
0
08 Jan 2024
LQ-LoRA: Low-rank Plus Quantized Matrix Decomposition for Efficient
  Language Model Finetuning
LQ-LoRA: Low-rank Plus Quantized Matrix Decomposition for Efficient Language Model FinetuningInternational Conference on Learning Representations (ICLR), 2023
Han Guo
P. Greengard
Eric P. Xing
Yoon Kim
MQ
467
82
0
20 Nov 2023
PB-LLM: Partially Binarized Large Language Models
PB-LLM: Partially Binarized Large Language ModelsInternational Conference on Learning Representations (ICLR), 2023
Yuzhang Shang
Zhihang Yuan
Qiang Wu
Zhen Dong
MQ
377
80
0
29 Sep 2023
LongBench: A Bilingual, Multitask Benchmark for Long Context
  Understanding
LongBench: A Bilingual, Multitask Benchmark for Long Context UnderstandingAnnual Meeting of the Association for Computational Linguistics (ACL), 2023
Yushi Bai
Xin Lv
Jiajie Zhang
Hong Lyu
Jiankai Tang
...
Aohan Zeng
Lei Hou
Yuxiao Dong
Jie Tang
Juanzi Li
LLMAGRALM
325
914
0
28 Aug 2023
OmniQuant: Omnidirectionally Calibrated Quantization for Large Language
  Models
OmniQuant: Omnidirectionally Calibrated Quantization for Large Language ModelsInternational Conference on Learning Representations (ICLR), 2023
Wenqi Shao
Mengzhao Chen
Zhaoyang Zhang
Peng Xu
Lirui Zhao
Zhiqiang Li
Kaipeng Zhang
Shiyang Feng
Yu Qiao
Ping Luo
MQ
500
311
0
25 Aug 2023
A Survey on Model Compression for Large Language Models
A Survey on Model Compression for Large Language ModelsTransactions of the Association for Computational Linguistics (TACL), 2023
Xunyu Zhu
Jian Li
Yong Liu
Can Ma
Weiping Wang
378
355
0
15 Aug 2023
Do Emergent Abilities Exist in Quantized Large Language Models: An
  Empirical Study
Do Emergent Abilities Exist in Quantized Large Language Models: An Empirical StudyInternational Conference on Language Resources and Evaluation (LREC), 2023
Peiyu Liu
Zikang Liu
Ze-Feng Gao
Dawei Gao
Wayne Xin Zhao
Yaliang Li
Bolin Ding
Ji-Rong Wen
MQLRM
263
44
0
16 Jul 2023
A Survey on Evaluation of Large Language Models
A Survey on Evaluation of Large Language ModelsACM Transactions on Intelligent Systems and Technology (ACM TIST), 2023
Yu-Chu Chang
Xu Wang
Yongfeng Zhang
Yuanyi Wu
Linyi Yang
...
Yue Zhang
Yi-Ju Chang
Philip S. Yu
Qian Yang
Xingxu Xie
ELMLM&MAALM
700
2,732
0
06 Jul 2023
H$_2$O: Heavy-Hitter Oracle for Efficient Generative Inference of Large
  Language Models
H2_22​O: Heavy-Hitter Oracle for Efficient Generative Inference of Large Language ModelsNeural Information Processing Systems (NeurIPS), 2023
Zhenyu Zhang
Ying Sheng
Wanrong Zhu
Tianlong Chen
Lianmin Zheng
...
Yuandong Tian
Christopher Ré
Clark W. Barrett
Zinan Lin
Beidi Chen
VLM
755
474
0
24 Jun 2023
A Simple and Effective Pruning Approach for Large Language Models
A Simple and Effective Pruning Approach for Large Language ModelsInternational Conference on Learning Representations (ICLR), 2023
Mingjie Sun
Zhuang Liu
Anna Bair
J. Zico Kolter
493
653
0
20 Jun 2023
SpQR: A Sparse-Quantized Representation for Near-Lossless LLM Weight
  Compression
SpQR: A Sparse-Quantized Representation for Near-Lossless LLM Weight CompressionInternational Conference on Learning Representations (ICLR), 2023
Tim Dettmers
Ruslan Svirschevski
Vage Egiazarian
Denis Kuznedelev
Elias Frantar
Saleh Ashkboos
Alexander Borzunov
Torsten Hoefler
Dan Alistarh
MQ
187
335
0
05 Jun 2023
AWQ: Activation-aware Weight Quantization for LLM Compression and
  Acceleration
AWQ: Activation-aware Weight Quantization for LLM Compression and AccelerationConference on Machine Learning and Systems (MLSys), 2023
Ji Lin
Jiaming Tang
Haotian Tang
Shang Yang
Wei-Ming Chen
Wei-Chen Wang
Guangxuan Xiao
Xingyu Dang
Chuang Gan
Song Han
EDLMQ
832
961
0
01 Jun 2023
LLM-QAT: Data-Free Quantization Aware Training for Large Language Models
LLM-QAT: Data-Free Quantization Aware Training for Large Language ModelsAnnual Meeting of the Association for Computational Linguistics (ACL), 2023
Zechun Liu
Barlas Oğuz
Changsheng Zhao
Ernie Chang
Pierre Stock
Yashar Mehdad
Yangyang Shi
Raghuraman Krishnamoorthi
Vikas Chandra
MQ
263
294
0
29 May 2023
GPT-4 Technical Report
GPT-4 Technical Report
OpenAI OpenAI
OpenAI Josh Achiam
Steven Adler
Sandhini Agarwal
Lama Ahmad
...
Shengjia Zhao
Tianhao Zheng
Juntang Zhuang
William Zhuk
Barret Zoph
LLMAGMLLM
4.6K
20,902
0
15 Mar 2023
Towards MoE Deployment: Mitigating Inefficiencies in Mixture-of-Expert
  (MoE) Inference
Towards MoE Deployment: Mitigating Inefficiencies in Mixture-of-Expert (MoE) Inference
Haiyang Huang
Newsha Ardalani
Anna Y. Sun
Liu Ke
Hsien-Hsin S. Lee
Anjali Sridhar
Shruti Bhosale
Carole-Jean Wu
Benjamin C. Lee
MoE
173
37
0
10 Mar 2023
SparseGPT: Massive Language Models Can Be Accurately Pruned in One-Shot
SparseGPT: Massive Language Models Can Be Accurately Pruned in One-ShotInternational Conference on Machine Learning (ICML), 2023
Elias Frantar
Dan Alistarh
VLM
560
1,037
0
02 Jan 2023
Memory-efficient NLLB-200: Language-specific Expert Pruning of a
  Massively Multilingual Machine Translation Model
Memory-efficient NLLB-200: Language-specific Expert Pruning of a Massively Multilingual Machine Translation ModelAnnual Meeting of the Association for Computational Linguistics (ACL), 2022
Yeskendir Koishekenov
Alexandre Berard
Vassilina Nikoulina
MoE
251
41
0
19 Dec 2022
MegaBlocks: Efficient Sparse Training with Mixture-of-Experts
MegaBlocks: Efficient Sparse Training with Mixture-of-ExpertsConference on Machine Learning and Systems (MLSys), 2022
Trevor Gale
Deepak Narayanan
C. Young
Matei A. Zaharia
MoE
203
159
0
29 Nov 2022
SmoothQuant: Accurate and Efficient Post-Training Quantization for Large
  Language Models
SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language ModelsInternational Conference on Machine Learning (ICML), 2022
Guangxuan Xiao
Ji Lin
Mickael Seznec
Hao Wu
Julien Demouth
Song Han
MQ
800
1,187
0
18 Nov 2022
GPTQ: Accurate Post-Training Quantization for Generative Pre-trained
  Transformers
GPTQ: Accurate Post-Training Quantization for Generative Pre-trained Transformers
Elias Frantar
Saleh Ashkboos
Torsten Hoefler
Dan Alistarh
MQ
510
1,525
0
31 Oct 2022
12
Next