Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2401.04044
Cited By
FFSplit: Split Feed-Forward Network For Optimizing Accuracy-Efficiency Trade-off in Language Model Inference
8 January 2024
Zirui Liu
Qingquan Song
Q. Xiao
Sathiya Keerthi Selvaraj
Rahul Mazumder
Aman Gupta
Xia Hu
Re-assign community
ArXiv
PDF
HTML
Papers citing
"FFSplit: Split Feed-Forward Network For Optimizing Accuracy-Efficiency Trade-off in Language Model Inference"
7 / 7 papers shown
Title
Mobile Edge Intelligence for Large Language Models: A Contemporary Survey
Guanqiao Qu
Qiyuan Chen
Wei Wei
Zheng Lin
Xianhao Chen
Kaibin Huang
31
37
0
09 Jul 2024
Turbo Sparse: Achieving LLM SOTA Performance with Minimal Activated Parameters
Yixin Song
Haotong Xie
Zhengyan Zhang
Bo Wen
Li Ma
Zeyu Mi
Haibo Chen
MoE
20
21
0
10 Jun 2024
SparseLLM: Towards Global Pruning for Pre-trained Language Models
Guangji Bai
Yijiang Li
Chen Ling
Kibaek Kim
Liang Zhao
14
6
0
28 Feb 2024
FlexGen: High-Throughput Generative Inference of Large Language Models with a Single GPU
Ying Sheng
Lianmin Zheng
Binhang Yuan
Zhuohan Li
Max Ryabinin
...
Joseph E. Gonzalez
Percy Liang
Christopher Ré
Ion Stoica
Ce Zhang
138
208
0
13 Mar 2023
Accelerated Sparse Neural Training: A Provable and Efficient Method to Find N:M Transposable Masks
Itay Hubara
Brian Chmiel
Moshe Island
Ron Banner
S. Naor
Daniel Soudry
44
89
0
16 Feb 2021
Scaling Laws for Neural Language Models
Jared Kaplan
Sam McCandlish
T. Henighan
Tom B. Brown
B. Chess
R. Child
Scott Gray
Alec Radford
Jeff Wu
Dario Amodei
220
3,054
0
23 Jan 2020
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding
Alex Jinpeng Wang
Amanpreet Singh
Julian Michael
Felix Hill
Omer Levy
Samuel R. Bowman
ELM
294
6,927
0
20 Apr 2018
1