Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2306.11222
Cited By
LoSparse: Structured Compression of Large Language Models based on Low-Rank and Sparse Approximation
20 June 2023
Yixiao Li
Yifan Yu
Qingru Zhang
Chen Liang
Pengcheng He
Weizhu Chen
Tuo Zhao
Re-assign community
ArXiv
PDF
HTML
Papers citing
"LoSparse: Structured Compression of Large Language Models based on Low-Rank and Sparse Approximation"
50 / 52 papers shown
Title
AccLLM: Accelerating Long-Context LLM Inference Via Algorithm-Hardware Co-Design
Yanbiao Liang
Huihong Shi
Haikuo Shao
Zhongfeng Wang
13
0
0
07 Apr 2025
Towards Understanding and Improving Refusal in Compressed Models via Mechanistic Interpretability
Vishnu Kabir Chhabra
Mohammad Mahdi Khalili
AI4CE
28
0
0
05 Apr 2025
Penrose Tiled Low-Rank Compression and Section-Wise Q&A Fine-Tuning: A General Framework for Domain-Specific Large Language Model Adaptation
Chuan-Wei Kuo
Siyu Chen
Chenqi Yan
Yu Liu
55
0
0
28 Mar 2025
IDEA Prune: An Integrated Enlarge-and-Prune Pipeline in Generative Language Model Pretraining
Yixiao Li
Xianzhi Du
Ajay Jaiswal
Tao Lei
T. Zhao
Chong-Jun Wang
Jianyu Wang
38
1
0
07 Mar 2025
Keeping Yourself is Important in Downstream Tuning Multimodal Large Language Model
Wenke Huang
Jian Liang
Xianda Guo
Yiyang Fang
Guancheng Wan
...
Bin Yang
He Li
Jiawei Shao
Mang Ye
Bo Du
OffRL
LRM
MLLM
KELM
VLM
63
1
0
06 Mar 2025
Identifying Sensitive Weights via Post-quantization Integral
Yuezhou Hu
Weiyu Huang
Zichen Liang
C. L. P. Chen
Jintao Zhang
J. Zhu
Jianfei Chen
MQ
37
2
0
28 Feb 2025
Delta Decompression for MoE-based LLMs Compression
Hao Gu
Wei Li
Lujun Li
Qiyuan Zhu
Mark Lee
Shengjie Sun
Wei Xue
Yike Guo
MoE
47
0
0
24 Feb 2025
R-LoRA: Random Initialization of Multi-Head LoRA for Multi-Task Learning
Jinda Liu
Yi-Ju Chang
Yuan Wu
50
0
0
24 Feb 2025
Optimizing Singular Spectrum for Large Language Model Compression
Dengjie Li
Tiancheng Shen
Yao Zhou
Baisong Yang
Zhongying Liu
Masheng Yang
Bernard Ghanem
Yibo Yang
Yujie Zhong
Ming-Hsuan Yang
63
0
0
24 Feb 2025
Dynamic Low-Rank Sparse Adaptation for Large Language Models
Weizhong Huang
Yuxin Zhang
Xiawu Zheng
Y. Liu
Jing Lin
Yiwu Yao
Rongrong Ji
85
0
0
21 Feb 2025
FedSpaLLM: Federated Pruning of Large Language Models
Guangji Bai
Yijiang Li
Zilinghan Li
Liang Zhao
Kibaek Kim
FedML
57
3
0
20 Feb 2025
SLoPe: Double-Pruned Sparse Plus Lazy Low-Rank Adapter Pretraining of LLMs
Mohammad Mozaffari
Amir Yazdanbakhsh
Zhao Zhang
M. Dehnavi
62
5
0
28 Jan 2025
HOBBIT: A Mixed Precision Expert Offloading System for Fast MoE Inference
Peng Tang
Jiacheng Liu
X. Hou
Yifei Pu
Jing Wang
Pheng-Ann Heng
C. Li
M. Guo
MoE
57
6
0
03 Nov 2024
Magnitude Pruning of Large Pretrained Transformer Models with a Mixture Gaussian Prior
Mingxuan Zhang
Y. Sun
F. Liang
24
0
0
01 Nov 2024
Joint Fine-tuning and Conversion of Pretrained Speech and Language Models towards Linear Complexity
Mutian He
Philip N. Garner
78
0
0
09 Oct 2024
Geometry is All You Need: A Unified Taxonomy of Matrix and Tensor Factorization for Compression of Generative Language Models
Mingxue Xu
Sadia Sharmin
Danilo P. Mandic
14
2
0
03 Oct 2024
Harmful Fine-tuning Attacks and Defenses for Large Language Models: A Survey
Tiansheng Huang
Sihao Hu
Fatih Ilhan
Selim Furkan Tekin
Ling Liu
AAML
38
21
0
26 Sep 2024
OATS: Outlier-Aware Pruning Through Sparse and Low Rank Decomposition
Stephen Zhang
V. Papyan
VLM
38
1
0
20 Sep 2024
Practical token pruning for foundation models in few-shot conversational virtual assistant systems
Haode Qi
Cheng Qian
Jian Ni
Pratyush Singh
Reza Fazeli
Gengyu Wang
Zhongzheng Shu
Eric Wayne
Juergen Bross
18
0
0
21 Aug 2024
FactorLLM: Factorizing Knowledge via Mixture of Experts for Large Language Models
Zhongyu Zhao
Menghang Dong
Rongyu Zhang
Wenzhao Zheng
Yunpeng Zhang
Huanrui Yang
Dalong Du
Kurt Keutzer
Shanghang Zhang
46
0
0
15 Aug 2024
Eigen Attention: Attention in Low-Rank Space for KV Cache Compression
Utkarsh Saxena
Gobinda Saha
Sakshi Choudhary
Kaushik Roy
21
8
0
10 Aug 2024
From GaLore to WeLore: How Low-Rank Weights Non-uniformly Emerge from Low-Rank Gradients
Ajay Jaiswal
Lu Yin
Zhenyu (Allen) Zhang
Shiwei Liu
Jiawei Zhao
Yuandong Tian
Zhangyang Wang
31
14
0
15 Jul 2024
Can Low-Rank Knowledge Distillation in LLMs be Useful for Microelectronic Reasoning?
N. Rouf
Fin Amin
Paul D. Franzon
26
0
0
19 Jun 2024
RoseLoRA: Row and Column-wise Sparse Low-rank Adaptation of Pre-trained Language Model for Knowledge Editing and Fine-tuning
Haoyu Wang
Tianci Liu
Ruirui Li
Monica Cheng
Tuo Zhao
Jing Gao
23
6
0
16 Jun 2024
BlockPruner: Fine-grained Pruning for Large Language Models
Longguang Zhong
Fanqi Wan
Ruijun Chen
Xiaojun Quan
Liangzhi Li
16
7
0
15 Jun 2024
VTrans: Accelerating Transformer Compression with Variational Information Bottleneck based Pruning
Oshin Dutta
Ritvik Gupta
Sumeet Agarwal
31
1
0
07 Jun 2024
SLTrain: a sparse plus low-rank approach for parameter and memory efficient pretraining
Andi Han
Jiaxiang Li
Wei Huang
Mingyi Hong
Akiko Takeda
Pratik Jawanpuria
Bamdev Mishra
28
9
0
04 Jun 2024
Surgical Feature-Space Decomposition of LLMs: Why, When and How?
Arnav Chavan
Nahush Lele
Deepak Gupta
20
2
0
17 May 2024
A Survey on Efficient Inference for Large Language Models
Zixuan Zhou
Xuefei Ning
Ke Hong
Tianyu Fu
Jiaming Xu
...
Shengen Yan
Guohao Dai
Xiao-Ping Zhang
Yuhan Dong
Yu-Xiang Wang
46
78
0
22 Apr 2024
LoRAP: Transformer Sub-Layers Deserve Differentiated Structured Compression for Large Language Models
Guangyan Li
Yongqiang Tang
Wensheng Zhang
33
5
0
15 Apr 2024
PiSSA: Principal Singular Values and Singular Vectors Adaptation of Large Language Models
Fanxu Meng
Zhaohui Wang
Muhan Zhang
VLM
45
66
0
03 Apr 2024
AFLoRA: Adaptive Freezing of Low Rank Adaptation in Parameter Efficient Fine-Tuning of Large Models
Zeyu Liu
Souvik Kundu
Anni Li
Junrui Wan
Lianghao Jiang
P. Beerel
23
9
0
20 Mar 2024
Deep Learning for Cross-Domain Data Fusion in Urban Computing: Taxonomy, Advances, and Outlook
Xingchen Zou
Yibo Yan
Xixuan Hao
Yuehong Hu
Haomin Wen
...
Junbo Zhang
Yong Li
Tianrui Li
Yu Zheng
Yuxuan Liang
HAI
AI4TS
43
35
0
29 Feb 2024
SparseLLM: Towards Global Pruning for Pre-trained Language Models
Guangji Bai
Yijiang Li
Chen Ling
Kibaek Kim
Liang Zhao
14
6
0
28 Feb 2024
A Survey on Knowledge Distillation of Large Language Models
Xiaohan Xu
Ming Li
Chongyang Tao
Tao Shen
Reynold Cheng
Jinyang Li
Can Xu
Dacheng Tao
Tianyi Zhou
KELM
VLM
42
94
0
20 Feb 2024
Model Compression and Efficient Inference for Large Language Models: A Survey
Wenxiao Wang
Wei Chen
Yicong Luo
Yongliu Long
Zhengkai Lin
Liye Zhang
Binbin Lin
Deng Cai
Xiaofei He
MQ
33
30
0
15 Feb 2024
Get More with LESS: Synthesizing Recurrence with KV Cache Compression for Efficient LLM Inference
Harry Dong
Xinyu Yang
Zhenyu (Allen) Zhang
Zhangyang Wang
Yuejie Chi
Beidi Chen
12
47
0
14 Feb 2024
Memory-Efficient Vision Transformers: An Activation-Aware Mixed-Rank Compression Strategy
Seyedarmin Azizi
M. Nazemi
Massoud Pedram
ViT
MQ
27
2
0
08 Feb 2024
Assessing the Brittleness of Safety Alignment via Pruning and Low-Rank Modifications
Boyi Wei
Kaixuan Huang
Yangsibo Huang
Tinghao Xie
Xiangyu Qi
Mengzhou Xia
Prateek Mittal
Mengdi Wang
Peter Henderson
AAML
55
78
0
07 Feb 2024
Faster and Lighter LLMs: A Survey on Current Challenges and Way Forward
Arnav Chavan
Raghav Magazine
Shubham Kushwaha
M. Debbah
Deepak Gupta
8
18
0
02 Feb 2024
Vaccine: Perturbation-aware Alignment for Large Language Model
Tiansheng Huang
Sihao Hu
Ling Liu
42
32
0
02 Feb 2024
Towards Efficient Generative Large Language Model Serving: A Survey from Algorithms to Systems
Xupeng Miao
Gabriele Oliaro
Zhihao Zhang
Xinhao Cheng
Hongyi Jin
Tianqi Chen
Zhihao Jia
40
75
0
23 Dec 2023
ASVD: Activation-aware Singular Value Decomposition for Compressing Large Language Models
Zhihang Yuan
Yuzhang Shang
Yue Song
Qiang Wu
Yan Yan
Guangyu Sun
MQ
26
41
0
10 Dec 2023
SiDA-MoE: Sparsity-Inspired Data-Aware Serving for Efficient and Scalable Large Mixture-of-Experts Models
Zhixu Du
Shiyu Li
Yuhao Wu
Xiangyu Jiang
Jingwei Sun
Qilin Zheng
Yongkai Wu
Ang Li
Hai Helen Li
Yiran Chen
MoE
10
11
0
29 Oct 2023
LoftQ: LoRA-Fine-Tuning-Aware Quantization for Large Language Models
Yixiao Li
Yifan Yu
Chen Liang
Pengcheng He
Nikos Karampatziakis
Weizhu Chen
Tuo Zhao
MQ
28
117
0
12 Oct 2023
Merge, Then Compress: Demystify Efficient SMoE with Hints from Its Routing Policy
Pingzhi Li
Zhenyu (Allen) Zhang
Prateek Yadav
Yi-Lin Sung
Yu Cheng
Mohit Bansal
Tianlong Chen
MoMe
13
33
0
02 Oct 2023
LORD: Low Rank Decomposition Of Monolingual Code LLMs For One-Shot Compression
Ayush Kaushal
Tejas Vaidhya
Irina Rish
36
14
0
25 Sep 2023
Norm Tweaking: High-performance Low-bit Quantization of Large Language Models
Liang Li
Qingyuan Li
Bo-Wen Zhang
Xiangxiang Chu
MQ
19
28
0
06 Sep 2023
KroneckerBERT: Learning Kronecker Decomposition for Pre-trained Language Models via Knowledge Distillation
Marzieh S. Tahaei
Ella Charlaix
V. Nia
A. Ghodsi
Mehdi Rezagholizadeh
41
22
0
13 Sep 2021
BERT-of-Theseus: Compressing BERT by Progressive Module Replacing
Canwen Xu
Wangchunshu Zhou
Tao Ge
Furu Wei
Ming Zhou
210
196
0
07 Feb 2020
1
2
Next