ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2312.05821
  4. Cited By
ASVD: Activation-aware Singular Value Decomposition for Compressing Large Language Models
v1v2v3v4v5 (latest)

ASVD: Activation-aware Singular Value Decomposition for Compressing Large Language Models

10 December 2023
Zhihang Yuan
Yuzhang Shang
Yue Song
Dawei Yang
Qiang Wu
Yan Yan
Guangyu Sun
    MQ
ArXiv (abs)PDFHTMLGithub

Papers citing "ASVD: Activation-aware Singular Value Decomposition for Compressing Large Language Models"

50 / 78 papers shown
Low-Rank Prehab: Preparing Neural Networks for SVD Compression
Low-Rank Prehab: Preparing Neural Networks for SVD Compression
Haoran Qin
Shansita D. Sharma
Ali Abbasi
Chayne Thrash
Soheil Kolouri
214
0
0
01 Dec 2025
Decomposed Trust: Exploring Privacy, Adversarial Robustness, Fairness, and Ethics of Low-Rank LLMs
Decomposed Trust: Exploring Privacy, Adversarial Robustness, Fairness, and Ethics of Low-Rank LLMs
Daniel Agyei Asante
Md Mokarram Chowdhury
Yang Li
158
0
0
27 Nov 2025
Globally optimized SVD compression of LLMs via Fermi-function-based rank selection and gauge fixing
Globally optimized SVD compression of LLMs via Fermi-function-based rank selection and gauge fixing
Roman Rausch
David Jansen
Sukhbinder Singh
Roman Orus
68
2
0
26 Nov 2025
Optimal Singular Damage: Efficient LLM Inference in Low Storage Regimes
Optimal Singular Damage: Efficient LLM Inference in Low Storage Regimes
Mohammadsajad Alipour
Mohammad Mohammadi Amiri
133
0
0
04 Nov 2025
NeuronMM: High-Performance Matrix Multiplication for LLM Inference on AWS Trainium
NeuronMM: High-Performance Matrix Multiplication for LLM Inference on AWS Trainium
Dinghong Song
Jierui Xu
Weichu Yang
Pengfei Su
Dong Li
227
0
0
29 Oct 2025
KORE: Enhancing Knowledge Injection for Large Multimodal Models via Knowledge-Oriented Augmentations and Constraints
KORE: Enhancing Knowledge Injection for Large Multimodal Models via Knowledge-Oriented Augmentations and Constraints
Kailin Jiang
Hongbo Jiang
Ning Jiang
Zhi Gao
Jinhe Bi
Yuchen Ren
B. Li
Yuntao Du
L. J. Liu
Qing Li
CLLOffRLKELMVLM
262
6
0
22 Oct 2025
ARA: Adaptive Rank Allocation for Efficient Large Language Model SVD Compression
ARA: Adaptive Rank Allocation for Efficient Large Language Model SVD Compression
Lin Xv
Jingsheng Gao
Xian Gao
Ting Liu
Yuzhuo Fu
165
2
0
22 Oct 2025
Neuronal Group Communication for Efficient Neural representation
Neuronal Group Communication for Efficient Neural representation
Zhengqi Pei
Qingming Huang
Shuhui Wang
164
0
0
19 Oct 2025
QSVD: Efficient Low-rank Approximation for Unified Query-Key-Value Weight Compression in Low-Precision Vision-Language Models
QSVD: Efficient Low-rank Approximation for Unified Query-Key-Value Weight Compression in Low-Precision Vision-Language Models
Yutong Wang
Haiyu Wang
Sai Qian Zhang
129
1
0
18 Oct 2025
A Guardrail for Safety Preservation: When Safety-Sensitive Subspace Meets Harmful-Resistant Null-Space
A Guardrail for Safety Preservation: When Safety-Sensitive Subspace Meets Harmful-Resistant Null-Space
Bingjie Zhang
Yibo Yang
Renzhe
Dandan Guo
Jindong Gu
Philip Torr
Bernard Ghanem
336
3
0
16 Oct 2025
Purifying Task Vectors in Knowledge-Aware Subspace for Model Merging
Purifying Task Vectors in Knowledge-Aware Subspace for Model Merging
Bang An
Yibo Yang
Philip Torr
Bernard Ghanem
MoMe
203
1
0
16 Oct 2025
Neural Weight Compression for Language Models
Neural Weight Compression for Language Models
Jegwang Ryu
Minkyu Kim
Seungjun Shin
Hee Min Choi
Dokwan Oh
Jaeho Lee
176
0
0
13 Oct 2025
Efficient Resource-Constrained Training of Transformers via Subspace Optimization
Efficient Resource-Constrained Training of Transformers via Subspace Optimization
Le-Trung Nguyen
Enzo Tartaglione
Van-Tam Nguyen
186
0
0
10 Oct 2025
Boomerang Distillation Enables Zero-Shot Model Size Interpolation
Boomerang Distillation Enables Zero-Shot Model Size Interpolation
Sara Kangaslahti
Nihal V. Nayak
Jonathan Geuter
Marco Fumero
Francesco Locatello
David Alvarez-Melis
211
1
0
06 Oct 2025
Accelerating Attention with Basis Decomposition
Accelerating Attention with Basis Decomposition
Jialin Zhao
193
0
0
02 Oct 2025
Layer-wise dynamic rank for compressing large language models
Layer-wise dynamic rank for compressing large language models
Zhendong Mi
Bian Sun
Grace Li Zhang
Shaoyi Huang
ALM
238
3
0
30 Sep 2025
Quant-dLLM: Post-Training Extreme Low-Bit Quantization for Diffusion Large Language Models
Quant-dLLM: Post-Training Extreme Low-Bit Quantization for Diffusion Large Language Models
Tianao Zhang
Zhiteng Li
Xianglong Yan
Haotong Qin
Yong Guo
Yulun Zhang
MQ
199
5
0
27 Sep 2025
Memory-Efficient Fine-Tuning via Low-Rank Activation Compression
Memory-Efficient Fine-Tuning via Low-Rank Activation Compression
Jiang-Xin Shi
Wen-Da Wei
Jin-Fei Qi
Xuanyu Chen
Tong Wei
Yu-Feng Li
176
0
0
27 Sep 2025
CoSpaDi: Compressing LLMs via Calibration-Guided Sparse Dictionary Learning
CoSpaDi: Compressing LLMs via Calibration-Guided Sparse Dictionary Learning
Dmitriy Shopkhoev
Denis Makhov
Magauiya Zhussip
Ammar Ali
Stamatios Lefkimmiatis
240
3
0
26 Sep 2025
Understanding Post-Training Structural Changes in Large Language Models
Understanding Post-Training Structural Changes in Large Language Models
Xinyu He
Xianghui Cao
243
0
0
22 Sep 2025
RL Fine-Tuning Heals OOD Forgetting in SFT
RL Fine-Tuning Heals OOD Forgetting in SFT
Hangzhan Jin
Sitao Luan
Sicheng Lyu
Guillaume Rabusseau
Reihaneh Rabbany
Doina Precup
Mohammad Hamdaqa
CLLLRM
223
10
0
08 Sep 2025
CALR: Corrective Adaptive Low-Rank Decomposition for Efficient Large Language Model Layer Compression
CALR: Corrective Adaptive Low-Rank Decomposition for Efficient Large Language Model Layer Compression
Muchammad Daniyal Kautsar
Afra Majida Hariono
Widyawan
Syukron Abu Ishaq Alfarozi
Kuntpong Woraratpanya
197
0
0
21 Aug 2025
Importance-Aware Activation Space Reconstruction
Importance-Aware Activation Space Reconstruction
Md Mokarram Chowdhury
Daniel Agyei Asante
E. Chang
Yang Li
194
0
0
04 Jul 2025
TensorSLM: Energy-efficient Embedding Compression of Sub-billion Parameter Language Models on Low-end Devices
TensorSLM: Energy-efficient Embedding Compression of Sub-billion Parameter Language Models on Low-end Devices
Mingxue Xu
Y. Xu
Danilo Mandic
216
0
0
16 Jun 2025
Dynamic Context-oriented Decomposition for Task-aware Low-rank Adaptation with Less Forgetting and Faster Convergence
Dynamic Context-oriented Decomposition for Task-aware Low-rank Adaptation with Less Forgetting and Faster Convergence
Jianlong Wu
Sihao Liu
Chuan Rao
Bang An
Tiancheng Shen
Juil Sock
Ming-Hsuan Yang
Bernard Ghanem
348
5
0
16 Jun 2025
ReCalKV: Low-Rank KV Cache Compression via Head Reordering and Offline Calibration
ReCalKV: Low-Rank KV Cache Compression via Head Reordering and Offline Calibration
Xianglong Yan
Zhiteng Li
Tianao Zhang
Linghe Kong
Yulun Zhang
Yulun Zhang
Yunbo Wang
554
4
0
30 May 2025
LittleBit: Ultra Low-Bit Quantization via Latent Factorization
LittleBit: Ultra Low-Bit Quantization via Latent Factorization
Banseok Lee
Dongkyu Kim
Youngcheon You
Youngmin Kim
MQ
326
10
0
30 May 2025
Mustafar: Promoting Unstructured Sparsity for KV Cache Pruning in LLM Inference
Mustafar: Promoting Unstructured Sparsity for KV Cache Pruning in LLM Inference
Donghyeon Joo
Helya Hosseini
Ramyad Hadidi
Bahar Asgari
315
2
0
28 May 2025
Efficient Large Language Model Inference with Neural Block Linearization
Efficient Large Language Model Inference with Neural Block Linearization
Mete Erdogan
F. Tonin
Volkan Cevher
430
1
0
27 May 2025
TuneComp: Joint Fine-tuning and Compression for Large Foundation Models
TuneComp: Joint Fine-tuning and Compression for Large Foundation Models
Xiangyu Chen
Jing Liu
Ye Wang
Matthew Brand
Wang
T. Koike-Akino
287
0
0
27 May 2025
Multi-objective Large Language Model Alignment with Hierarchical Experts
Multi-objective Large Language Model Alignment with Hierarchical Experts
Zhuo Li
Guodong DU
Weiyang Guo
Yigeng Zhou
Xiucheng Li
...
Fangming Liu
Yequan Wang
Deheng Ye
Min Zhang
Jing Li
ALMMoE
414
4
0
27 May 2025
ResSVD: Residual Compensated SVD for Large Language Model Compression
ResSVD: Residual Compensated SVD for Large Language Model Compression
Haolei Bai
Siyong Jian
Tuo Liang
Yu Yin
Huan Wang
410
5
0
26 May 2025
Can Compressed LLMs Truly Act? An Empirical Evaluation of Agentic Capabilities in LLM Compression
Can Compressed LLMs Truly Act? An Empirical Evaluation of Agentic Capabilities in LLM Compression
Peijie Dong
Zhenheng Tang
Xiang Liu
Lujun Li
Xiaowen Chu
Bo Li
533
14
0
26 May 2025
Tensorization is a powerful but underexplored tool for compression and interpretability of neural networks
Tensorization is a powerful but underexplored tool for compression and interpretability of neural networks
Safa Hamreras
Sukhbinder Singh
Roman Orus
438
1
0
26 May 2025
$μ$-MoE: Test-Time Pruning as Micro-Grained Mixture-of-Experts
μμμ-MoE: Test-Time Pruning as Micro-Grained Mixture-of-Experts
T. Koike-Akino
Jing Liu
Ye Wang
MoE
270
2
0
24 May 2025
LatentLLM: Attention-Aware Joint Tensor Compression
LatentLLM: Attention-Aware Joint Tensor Compression
T. Koike-Akino
Xiangyu Chen
Jing Liu
Ye Wang
Wang
Matthew Brand
281
4
0
23 May 2025
A3 : an Analytical Low-Rank Approximation Framework for Attention
A3 : an Analytical Low-Rank Approximation Framework for Attention
Jeffrey T. H. Wong
Cheng Zhang
Xinye Cao
Pedro Gimenes
George A. Constantinides
Wayne Luk
Yiren Zhao
OffRLMQ
456
4
0
19 May 2025
Accelerating Diffusion Transformer via Increment-Calibrated Caching with Channel-Aware Singular Value Decomposition
Accelerating Diffusion Transformer via Increment-Calibrated Caching with Channel-Aware Singular Value DecompositionComputer Vision and Pattern Recognition (CVPR), 2025
Zhiyuan Chen
Keyi Li
Yifan Jia
Le Ye
Yufei Ma
DiffM
364
8
0
09 May 2025
Diffusion Model Quantization: A Review
Diffusion Model Quantization: A Review
Qian Zeng
Chenggong Hu
Weilong Dai
Jie Song
MQ
504
4
0
08 May 2025
Beyond Low-rank Decomposition: A Shortcut Approach for Efficient On-Device Learning
Beyond Low-rank Decomposition: A Shortcut Approach for Efficient On-Device Learning
Le-Trung Nguyen
Ael Quélennec
Van-Tam Nguyen
Enzo Tartaglione
448
5
0
08 May 2025
Position: Enough of Scaling LLMs! Lets Focus on Downscaling
Position: Enough of Scaling LLMs! Lets Focus on Downscaling
Ayan Sengupta
Ayan Sengupta
Tanmoy Chakraborty
517
6
0
02 May 2025
MiLo: Efficient Quantized MoE Inference with Mixture of Low-Rank Compensators
MiLo: Efficient Quantized MoE Inference with Mixture of Low-Rank Compensators
Beichen Huang
Yueming Yuan
Zelei Shao
Minjia Zhang
MQMoE
487
2
0
03 Apr 2025
Large Language Model Compression via the Nested Activation-Aware Decomposition
Large Language Model Compression via the Nested Activation-Aware Decomposition
Jun Lu
Tianyi Xu
Bill Ding
David Li
Yu Kang
265
1
0
21 Mar 2025
CASP: Compression of Large Multimodal Models Based on Attention Sparsity
CASP: Compression of Large Multimodal Models Based on Attention SparsityComputer Vision and Pattern Recognition (CVPR), 2025
Mohsen Gholami
Mohammad Akbari
Kevin Cannons
Yong Zhang
360
4
0
07 Mar 2025
Universality of Layer-Level Entropy-Weighted Quantization Beyond Model Architecture and Size
Universality of Layer-Level Entropy-Weighted Quantization Beyond Model Architecture and Size
Alireza Behtash
Marijan Fofonjka
Ethan Baird
Tyler Mauer
Hossein Moghimifam
David Stout
Joel Dennison
MQ
487
4
0
06 Mar 2025
Delta Decompression for MoE-based LLMs Compression
Delta Decompression for MoE-based LLMs Compression
Hao Gu
Wei Li
Lujun Li
Qiyuan Zhu
Mark Lee
Shengjie Sun
Wei Xue
Wenhan Luo
MoE
410
28
0
24 Feb 2025
Benchmarking Post-Training Quantization in LLMs: Comprehensive Taxonomy, Unified Evaluation, and Comparative Analysis
Benchmarking Post-Training Quantization in LLMs: Comprehensive Taxonomy, Unified Evaluation, and Comparative Analysis
Jiaqi Zhao
Ming Wang
Miao Zhang
Yuzhang Shang
Xuebo Liu
Yaowei Wang
Min Zhang
Liqiang Nie
MQ
746
6
0
18 Feb 2025
Choose Your Model Size: Any Compression of Large Language Models Without Re-Computation
Choose Your Model Size: Any Compression of Large Language Models Without Re-Computation
Martin Genzel
Patrick Putzky
Pengfei Zhao
Siyang Song
Mattes Mollenhauer
Robert Seidel
Stefan Dietzel
Thomas Wollmann
320
0
0
03 Feb 2025
Progressive Binarization with Semi-Structured Pruning for LLMs
Progressive Binarization with Semi-Structured Pruning for LLMs
Xinyu Yan
Tianao Zhang
Zhiteng Li
Yulun Zhang
Yulun Zhang
MQ
680
5
0
03 Feb 2025
Implicit Bias in Matrix Factorization and its Explicit Realization in a New Architecture
Implicit Bias in Matrix Factorization and its Explicit Realization in a New Architecture
Yikun Hou
Suvrit Sra
A. Yurtsever
388
0
0
27 Jan 2025
12
Next
Page 1 of 2