Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2407.00928
Cited By
FoldGPT: Simple and Effective Large Language Model Compression Scheme
1 July 2024
Songwei Liu
Chao Zeng
Lianqiang Li
Chenqian Yan
Lean Fu
Xing Mei
Fangmin Chen
Re-assign community
ArXiv
PDF
HTML
Papers citing
"FoldGPT: Simple and Effective Large Language Model Compression Scheme"
7 / 7 papers shown
Title
IteRABRe: Iterative Recovery-Aided Block Reduction
Haryo Akbarianto Wibowo
Haiyue Song
Hideki Tanaka
Masao Utiyama
Alham Fikri Aji
Raj Dabre
57
0
0
08 Mar 2025
Reassessing Layer Pruning in LLMs: New Insights and Methods
Yao Lu
Hao Cheng
Yujie Fang
Zeyu Wang
Jiaheng Wei
Dongwei Xu
Qi Xuan
Xiaoniu Yang
Zhaowei Zhu
61
0
0
23 Nov 2024
FuseGPT: Learnable Layers Fusion of Generative Pre-trained Transformers
Zehua Pei
Hui-Ling Zhen
Xianzhi Yu
Sinno Jialin Pan
M. Yuan
Bei Yu
AI4CE
84
0
0
21 Nov 2024
SwiftKV: Fast Prefill-Optimized Inference with Knowledge-Preserving Model Transformation
Aurick Qiao
Z. Yao
Samyam Rajbhandari
Yuxiong He
VLM
21
0
0
04 Oct 2024
QServe: W4A8KV4 Quantization and System Co-design for Efficient LLM Serving
Yujun Lin
Haotian Tang
Shang Yang
Zhekai Zhang
Guangxuan Xiao
Chuang Gan
Song Han
77
71
0
07 May 2024
SliceGPT: Compress Large Language Models by Deleting Rows and Columns
Saleh Ashkboos
Maximilian L. Croci
Marcelo Gennari do Nascimento
Torsten Hoefler
James Hensman
VLM
125
143
0
26 Jan 2024
Scaling Laws for Neural Language Models
Jared Kaplan
Sam McCandlish
T. Henighan
Tom B. Brown
B. Chess
R. Child
Scott Gray
Alec Radford
Jeff Wu
Dario Amodei
220
4,424
0
23 Jan 2020
1