CHESS: Optimizing LLM Inference via Channel-Wise Thresholding and Selective Sparsification

2 September 2024

Chun Jason Xue

Papers citing "CHESS: Optimizing LLM Inference via Channel-Wise Thresholding and Selective Sparsification"

2 / 2 papers shown

Title
FloE: On-the-Fly MoE Inference on Memory-constrained GPU Yuxin Zhou Zheng Li J. Zhang Jue Wang Y. Wang Zhongle Xie Ke Chen Lidan Shou MoE 43 0 0 09 May 2025
Faster MoE LLM Inference for Extremely Large Models Haoqi Yang Luohe Shi Qiwei Li Zuchao Li Ping Wang Bo Du Mengjia Shen Hai Zhao MoE 61 0 0 06 May 2025