ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2406.05955
  4. Cited By
Turbo Sparse: Achieving LLM SOTA Performance with Minimal Activated
  Parameters

Turbo Sparse: Achieving LLM SOTA Performance with Minimal Activated Parameters

10 June 2024
Yixin Song
Haotong Xie
Zhengyan Zhang
Bo Wen
Li Ma
Zeyu Mi
Haibo Chen
    MoE
ArXivPDFHTML

Papers citing "Turbo Sparse: Achieving LLM SOTA Performance with Minimal Activated Parameters"

21 / 21 papers shown
Title
Comet: Accelerating Private Inference for Large Language Model by Predicting Activation Sparsity
Comet: Accelerating Private Inference for Large Language Model by Predicting Activation Sparsity
Guang Yan
Yuhui Zhang
Zimu Guo
Lutan Zhao
Xiaojun Chen
Chen Wang
Wenhao Wang
Dan Meng
Rui Hou
21
0
0
12 May 2025
FloE: On-the-Fly MoE Inference on Memory-constrained GPU
FloE: On-the-Fly MoE Inference on Memory-constrained GPU
Yuxin Zhou
Zheng Li
J. Zhang
Jue Wang
Y. Wang
Zhongle Xie
Ke Chen
Lidan Shou
MoE
41
0
0
09 May 2025
R-Sparse: Rank-Aware Activation Sparsity for Efficient LLM Inference
R-Sparse: Rank-Aware Activation Sparsity for Efficient LLM Inference
Zhenyu (Allen) Zhang
Zechun Liu
Yuandong Tian
Harshit Khaitan
Z. Wang
Steven Li
57
0
0
28 Apr 2025
Scaling Up On-Device LLMs via Active-Weight Swapping Between DRAM and Flash
Scaling Up On-Device LLMs via Active-Weight Swapping Between DRAM and Flash
Fucheng Jia
Zewen Wu
Shiqi Jiang
Huiqiang Jiang
Qianxi Zhang
Y. Yang
Yunxin Liu
Ju Ren
Deyu Zhang
Ting Cao
40
0
0
11 Apr 2025
PLM: Efficient Peripheral Language Models Hardware-Co-Designed for Ubiquitous Computing
PLM: Efficient Peripheral Language Models Hardware-Co-Designed for Ubiquitous Computing
Cheng Deng
Luoyang Sun
Jiwen Jiang
Yongcheng Zeng
Xinjian Wu
...
Haoyang Li
Lei Chen
Lionel M. Ni
H. Zhang
Jun Wang
77
0
0
15 Mar 2025
SEAP: Training-free Sparse Expert Activation Pruning Unlock the Brainpower of Large Language Models
Xun Liang
Hanyu Wang
Huayi Lai
Simin Niu
Shichao Song
Jiawei Yang
Jihao Zhao
Feiyu Xiong
Bo Tang
Z. Li
VLM
40
0
0
10 Mar 2025
RWKV-Lite: Deeply Compressed RWKV for Resource-Constrained Devices
RWKV-Lite: Deeply Compressed RWKV for Resource-Constrained Devices
Wonkyo Choe
Yangfeng Ji
F. Lin
62
1
0
14 Dec 2024
Efficient LLM Inference using Dynamic Input Pruning and Cache-Aware Masking
Efficient LLM Inference using Dynamic Input Pruning and Cache-Aware Masking
Marco Federici
Davide Belli
M. V. Baalen
Amir Jalalirad
Andrii Skliar
Bence Major
Markus Nagel
Paul N. Whatmough
76
0
0
02 Dec 2024
PipeLLM: Fast and Confidential Large Language Model Services with
  Speculative Pipelined Encryption
PipeLLM: Fast and Confidential Large Language Model Services with Speculative Pipelined Encryption
Yifan Tan
Cheng Tan
Zeyu Mi
Haibo Chen
16
1
0
04 Nov 2024
Sparsing Law: Towards Large Language Models with Greater Activation Sparsity
Sparsing Law: Towards Large Language Models with Greater Activation Sparsity
Yuqi Luo
Chenyang Song
Xu Han
Y. Chen
Chaojun Xiao
Zhiyuan Liu
Maosong Sun
47
3
0
04 Nov 2024
Ripple: Accelerating LLM Inference on Smartphones with Correlation-Aware
  Neuron Management
Ripple: Accelerating LLM Inference on Smartphones with Correlation-Aware Neuron Management
Tuowei Wang
Ruwen Fan
Minxing Huang
Zixu Hao
Kun Li
Ting Cao
Youyou Lu
Yaoxue Zhang
Ju Ren
24
2
0
25 Oct 2024
A Survey: Collaborative Hardware and Software Design in the Era of Large
  Language Models
A Survey: Collaborative Hardware and Software Design in the Era of Large Language Models
Cong Guo
Feng Cheng
Zhixu Du
James Kiessling
Jonathan Ku
...
Qilin Zheng
Guanglei Zhou
Hai
Li-Wei Li
Yiran Chen
29
5
0
08 Oct 2024
Large Language Model Inference Acceleration: A Comprehensive Hardware Perspective
Large Language Model Inference Acceleration: A Comprehensive Hardware Perspective
Jinhao Li
Jiaming Xu
Shan Huang
Yonghua Chen
Wen Li
...
Jiayi Pan
Li Ding
Hao Zhou
Yu Wang
Guohao Dai
54
13
0
06 Oct 2024
Q-Sparse: All Large Language Models can be Fully Sparsely-Activated
Q-Sparse: All Large Language Models can be Fully Sparsely-Activated
Hongyu Wang
Shuming Ma
Ruiping Wang
Furu Wei
MoE
31
11
0
15 Jul 2024
A Closer Look into Mixture-of-Experts in Large Language Models
A Closer Look into Mixture-of-Experts in Large Language Models
Ka Man Lo
Zeyu Huang
Zihan Qiu
Zili Wang
Jie Fu
MoE
18
9
0
26 Jun 2024
Unlocking Continual Learning Abilities in Language Models
Unlocking Continual Learning Abilities in Language Models
Wenyu Du
Shuang Cheng
Tongxu Luo
Zihan Qiu
Zeyu Huang
Ka Chun Cheung
Reynold Cheng
Jie Fu
KELM
CLL
38
6
0
25 Jun 2024
PowerInfer-2: Fast Large Language Model Inference on a Smartphone
PowerInfer-2: Fast Large Language Model Inference on a Smartphone
Zhenliang Xue
Yixin Song
Zeyu Mi
Le Chen
Yubin Xia
Haibo Chen
46
42
0
10 Jun 2024
CATS: Contextually-Aware Thresholding for Sparsity in Large Language
  Models
CATS: Contextually-Aware Thresholding for Sparsity in Large Language Models
Je-Yong Lee
Donghyun Lee
Genghan Zhang
Mo Tiwari
Azalia Mirhoseini
33
11
0
12 Apr 2024
PowerInfer: Fast Large Language Model Serving with a Consumer-grade GPU
PowerInfer: Fast Large Language Model Serving with a Consumer-grade GPU
Yixin Song
Zeyu Mi
Haotong Xie
Haibo Chen
BDL
117
114
0
16 Dec 2023
ReLU Strikes Back: Exploiting Activation Sparsity in Large Language
  Models
ReLU Strikes Back: Exploiting Activation Sparsity in Large Language Models
Iman Mirzadeh
Keivan Alizadeh-Vahid
Sachin Mehta
C. C. D. Mundo
Oncel Tuzel
Golnoosh Samei
Mohammad Rastegari
Mehrdad Farajtabar
118
58
0
06 Oct 2023
The Pile: An 800GB Dataset of Diverse Text for Language Modeling
The Pile: An 800GB Dataset of Diverse Text for Language Modeling
Leo Gao
Stella Biderman
Sid Black
Laurence Golding
Travis Hoppe
...
Horace He
Anish Thite
Noa Nabeshima
Shawn Presser
Connor Leahy
AIMat
245
1,977
0
31 Dec 2020
1