Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2312.05821
Cited By
ASVD: Activation-aware Singular Value Decomposition for Compressing Large Language Models
10 December 2023
Zhihang Yuan
Yuzhang Shang
Yue Song
Qiang Wu
Yan Yan
Guangyu Sun
MQ
Re-assign community
ArXiv
PDF
HTML
Papers citing
"ASVD: Activation-aware Singular Value Decomposition for Compressing Large Language Models"
39 / 39 papers shown
Title
Accelerating Diffusion Transformer via Increment-Calibrated Caching with Channel-Aware Singular Value Decomposition
Zhiyuan Chen
Keyi Li
Yifan Jia
Le Ye
Yufei Ma
DiffM
19
0
0
09 May 2025
Diffusion Model Quantization: A Review
Qian Zeng
Chenggong Hu
Mingli Song
Jie Song
MQ
36
0
0
08 May 2025
Beyond Low-rank Decomposition: A Shortcut Approach for Efficient On-Device Learning
Le-Trung Nguyen
Ael Quélennec
Van-Tam Nguyen
Enzo Tartaglione
28
0
0
08 May 2025
Position: Enough of Scaling LLMs! Lets Focus on Downscaling
Ayan Sengupta
Yash Goel
Tanmoy Chakraborty
34
0
0
02 May 2025
MiLo: Efficient Quantized MoE Inference with Mixture of Low-Rank Compensators
Beichen Huang
Yueming Yuan
Zelei Shao
Minjia Zhang
MQ
MoE
37
0
0
03 Apr 2025
Large Language Model Compression via the Nested Activation-Aware Decomposition
Jun Lu
Tianyi Xu
Bill Ding
David Li
Yu Kang
37
0
0
21 Mar 2025
CASP: Compression of Large Multimodal Models Based on Attention Sparsity
Mohsen Gholami
Mohammad Akbari
Kevin Cannons
Yong Zhang
61
0
0
07 Mar 2025
Universality of Layer-Level Entropy-Weighted Quantization Beyond Model Architecture and Size
Alireza Behtash
Marijan Fofonjka
Ethan Baird
Tyler Mauer
Hossein Moghimifam
David Stout
Joel Dennison
MQ
50
1
0
06 Mar 2025
Delta Decompression for MoE-based LLMs Compression
Hao Gu
Wei Li
Lujun Li
Qiyuan Zhu
Mark Lee
Shengjie Sun
Wei Xue
Yike Guo
MoE
45
0
0
24 Feb 2025
Progressive Binarization with Semi-Structured Pruning for LLMs
X. Yan
Tianao Zhang
Zhiteng Li
Yulun Zhang
MQ
42
0
0
03 Feb 2025
Choose Your Model Size: Any Compression by a Single Gradient Descent
Martin Genzel
Patrick Putzky
Pengfei Zhao
S.
Mattes Mollenhauer
Robert Seidel
Stefan Dietzel
Thomas Wollmann
41
0
0
03 Feb 2025
You Only Prune Once: Designing Calibration-Free Model Compression With Policy Learning
Ayan Sengupta
Siddhant Chaudhary
Tanmoy Chakraborty
42
3
0
25 Jan 2025
KaSA: Knowledge-Aware Singular-Value Adaptation of Large Language Models
Fan Wang
Juyong Jiang
Chansung Park
Sunghun Kim
Jing Tang
88
0
0
08 Dec 2024
HOBBIT: A Mixed Precision Expert Offloading System for Fast MoE Inference
Peng Tang
Jiacheng Liu
X. Hou
Yifei Pu
Jing Wang
Pheng-Ann Heng
C. Li
M. Guo
MoE
55
6
0
03 Nov 2024
BitStack: Any-Size Compression of Large Language Models in Variable Memory Environments
Xinghao Wang
Pengyu Wang
Bo Wang
Dong Zhang
Yunhua Zhou
Xipeng Qiu
MQ
36
1
0
31 Oct 2024
EoRA: Training-free Compensation for Compressed LLM with Eigenspace Low-Rank Approximation
Shih-yang Liu
Huck Yang
Nai Chit Fung
Nai Chit Fung
Hongxu Yin
...
Jan Kautz
Yu-Chun Wang
Pavlo Molchanov
Min-Hung Chen
Min-Hung Chen
MQ
29
0
0
28 Oct 2024
Beware of Calibration Data for Pruning Large Language Models
Yixin Ji
Yang Xiang
Juntao Li
Qingrong Xia
Ping Li
Xinyu Duan
Zhefeng Wang
Min Zhang
31
2
0
23 Oct 2024
ESPACE: Dimensionality Reduction of Activations for Model Compression
Charbel Sakr
Brucek Khailany
13
2
0
07 Oct 2024
ARB-LLM: Alternating Refined Binarizations for Large Language Models
Zhiteng Li
X. Yan
Tianao Zhang
Haotong Qin
Dong Xie
Jiang Tian
Zhongchao Shi
Linghe Kong
Yulun Zhang
Xiaokang Yang
MQ
13
2
0
04 Oct 2024
Basis Sharing: Cross-Layer Parameter Sharing for Large Language Model Compression
Jingcun Wang
Yu-Guang Chen
Ing-Chao Lin
Bing Li
Grace Li Zhang
27
4
0
02 Oct 2024
Interpolating Video-LLMs: Toward Longer-sequence LMMs in a Training-free Manner
Yuzhang Shang
Bingxin Xu
Weitai Kang
Mu Cai
Yuheng Li
Zehao Wen
Zhen Dong
Kurt Keutzer
Yong Jae Lee
Yan Yan
23
0
0
19 Sep 2024
MoDeGPT: Modular Decomposition for Large Language Model Compression
Chi-Heng Lin
Shangqian Gao
James Seale Smith
Abhishek Patel
Shikhar Tuli
Yilin Shen
Hongxia Jin
Yen-Chang Hsu
65
6
0
19 Aug 2024
ThinK: Thinner Key Cache by Query-Driven Pruning
Yuhui Xu
Zhanming Jie
Hanze Dong
Lei Wang
Xudong Lu
Aojun Zhou
Amrita Saha
Caiming Xiong
Doyen Sahoo
58
14
0
30 Jul 2024
From GaLore to WeLore: How Low-Rank Weights Non-uniformly Emerge from Low-Rank Gradients
Ajay Jaiswal
Lu Yin
Zhenyu (Allen) Zhang
Shiwei Liu
Jiawei Zhao
Yuandong Tian
Zhangyang Wang
31
14
0
15 Jul 2024
MoA: Mixture of Sparse Attention for Automatic Large Language Model Compression
Tianyu Fu
Haofeng Huang
Xuefei Ning
Genghan Zhang
Boju Chen
...
Shiyao Li
Shengen Yan
Guohao Dai
Huazhong Yang
Yu Wang
MQ
31
2
0
21 Jun 2024
AdaPTwin: Low-Cost Adaptive Compression of Product Twins in Transformers
Emil Biju
Anirudh Sriram
Mert Pilanci
24
0
0
13 Jun 2024
CorDA: Context-Oriented Decomposition Adaptation of Large Language Models for Task-Aware Parameter-Efficient Fine-tuning
Yibo Yang
Xiaojie Li
Zhongzhu Zhou
S. Song
Jianlong Wu
Liqiang Nie
Bernard Ghanem
41
6
0
07 Jun 2024
I-LLM: Efficient Integer-Only Inference for Fully-Quantized Low-Bit Large Language Models
Xing Hu
Yuan Cheng
Dawei Yang
Zhihang Yuan
Jiangyong Yu
Chen Xu
Sifan Zhou
MQ
23
6
0
28 May 2024
Matryoshka Multimodal Models
Mu Cai
Jianwei Yang
Jianfeng Gao
Yong Jae Lee
VLM
39
25
0
27 May 2024
PTQ4DiT: Post-training Quantization for Diffusion Transformers
Junyi Wu
Haoxuan Wang
Yuzhang Shang
Mubarak Shah
Yan Yan
MQ
30
18
0
25 May 2024
A Survey on Efficient Inference for Large Language Models
Zixuan Zhou
Xuefei Ning
Ke Hong
Tianyu Fu
Jiaming Xu
...
Shengen Yan
Guohao Dai
Xiao-Ping Zhang
Yuhan Dong
Yu-Xiang Wang
46
78
0
22 Apr 2024
LLaVA-PruMerge: Adaptive Token Reduction for Efficient Large Multimodal Models
Yuzhang Shang
Mu Cai
Bingxin Xu
Yong Jae Lee
Yan Yan
VLM
23
104
0
22 Mar 2024
SVD-LLM: Truncation-aware Singular Value Decomposition for Large Language Model Compression
Xin Wang
Yu Zheng
Zhongwei Wan
Mi Zhang
MQ
53
43
0
12 Mar 2024
LLM Inference Unveiled: Survey and Roofline Model Insights
Zhihang Yuan
Yuzhang Shang
Yang Zhou
Zhen Dong
Zhe Zhou
...
Yong Jae Lee
Yan Yan
Beidi Chen
Guangyu Sun
Kurt Keutzer
37
77
0
26 Feb 2024
Assessing the Brittleness of Safety Alignment via Pruning and Low-Rank Modifications
Boyi Wei
Kaixuan Huang
Yangsibo Huang
Tinghao Xie
Xiangyu Qi
Mengzhou Xia
Prateek Mittal
Mengdi Wang
Peter Henderson
AAML
55
78
0
07 Feb 2024
A Survey on Model Compression for Large Language Models
Xunyu Zhu
Jian Li
Yong Liu
Can Ma
Weiping Wang
8
98
0
15 Aug 2023
Tensor Networks Meet Neural Networks: A Survey and Future Perspectives
Maolin Wang
Y. Pan
Zenglin Xu
Xiangli Yang
Guangxi Li
A. Cichocki
Andrzej Cichocki
38
19
0
22 Jan 2023
Low-rank lottery tickets: finding efficient low-rank neural networks via matrix differential equations
Steffen Schotthöfer
Emanuele Zangrando
J. Kusch
Gianluca Ceruti
Francesco Tudisco
50
30
0
26 May 2022
Initialization and Regularization of Factorized Neural Layers
M. Khodak
Neil A. Tenenholtz
Lester W. Mackey
Nicolò Fusi
63
56
0
03 May 2021
1