ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2308.07633
  4. Cited By
A Survey on Model Compression for Large Language Models

A Survey on Model Compression for Large Language Models

15 August 2023
Xunyu Zhu
Jian Li
Yong Liu
Can Ma
Weiping Wang
ArXivPDFHTML

Papers citing "A Survey on Model Compression for Large Language Models"

48 / 48 papers shown
Title
Stability in Single-Peaked Strategic Resource Selection Games
Stability in Single-Peaked Strategic Resource Selection Games
Henri Zeiler
21
3
0
09 May 2025
SPAP: Structured Pruning via Alternating Optimization and Penalty Methods
SPAP: Structured Pruning via Alternating Optimization and Penalty Methods
Hanyu Hu
Xiaoming Yuan
44
0
0
06 May 2025
Optimizing LLMs for Resource-Constrained Environments: A Survey of Model Compression Techniques
Optimizing LLMs for Resource-Constrained Environments: A Survey of Model Compression Techniques
Sanjay Surendranath Girija
Shashank Kapoor
Lakshit Arora
Dipen Pradhan
Aman Raj
Ankit Shetgaonkar
47
0
0
05 May 2025
Task-Oriented Semantic Communication in Large Multimodal Models-based Vehicle Networks
Task-Oriented Semantic Communication in Large Multimodal Models-based Vehicle Networks
Baoxia Du
H. Du
Dusit Niyato
Ruidong Li
51
0
0
05 May 2025
A Survey on Privacy Risks and Protection in Large Language Models
A Survey on Privacy Risks and Protection in Large Language Models
Kang Chen
Xiuze Zhou
Yuanguo Lin
Shibo Feng
Li Shen
Pengcheng Wu
AILaw
PILM
57
0
0
04 May 2025
ICQuant: Index Coding enables Low-bit LLM Quantization
ICQuant: Index Coding enables Low-bit LLM Quantization
Xinlin Li
Osama A. Hanna
Christina Fragouli
Suhas Diggavi
MQ
50
0
0
01 May 2025
Efficient LLMs with AMP: Attention Heads and MLP Pruning
Efficient LLMs with AMP: Attention Heads and MLP Pruning
Leandro Giusti Mugnaini
Bruno Yamamoto
Lucas Lauton de Alcantara
Victor Zacarias
Edson Bollis
Lucas Pellicer
A. H. R. Costa
Artur Jordao
37
0
0
29 Apr 2025
ConTextual: Improving Clinical Text Summarization in LLMs with Context-preserving Token Filtering and Knowledge Graphs
ConTextual: Improving Clinical Text Summarization in LLMs with Context-preserving Token Filtering and Knowledge Graphs
Fahmida Liza Piya
Rahmatollah Beheshti
39
0
0
23 Apr 2025
When Reasoning Meets Compression: Benchmarking Compressed Large Reasoning Models on Complex Reasoning Tasks
When Reasoning Meets Compression: Benchmarking Compressed Large Reasoning Models on Complex Reasoning Tasks
Nan Zhang
Yusen Zhang
Prasenjit Mitra
Rui Zhang
MQ
LRM
44
2
0
02 Apr 2025
Membership Inference Risks in Quantized Models: A Theoretical and Empirical Study
Eric Aubinais
Philippe Formont
Pablo Piantanida
Elisabeth Gassiat
38
0
0
10 Feb 2025
Symmetric Pruning of Large Language Models
Symmetric Pruning of Large Language Models
Kai Yi
Peter Richtárik
AAML
VLM
57
0
0
31 Jan 2025
An Invitation to Neuroalgebraic Geometry
An Invitation to Neuroalgebraic Geometry
G. Marchetti
V. Shahverdi
Stefano Mereta
Matthew Trager
Kathlén Kohn
107
2
0
31 Jan 2025
EoRA: Training-free Compensation for Compressed LLM with Eigenspace Low-Rank Approximation
EoRA: Training-free Compensation for Compressed LLM with Eigenspace Low-Rank Approximation
Shih-yang Liu
Huck Yang
Nai Chit Fung
Nai Chit Fung
Hongxu Yin
...
Jan Kautz
Yu-Chun Wang
Pavlo Molchanov
Min-Hung Chen
Min-Hung Chen
MQ
29
0
0
28 Oct 2024
Coarse-to-Fine Highlighting: Reducing Knowledge Hallucination in Large
  Language Models
Coarse-to-Fine Highlighting: Reducing Knowledge Hallucination in Large Language Models
Qitan Lv
Jie Wang
Hanzhu Chen
Bin Li
Yongdong Zhang
Feng Wu
HILM
17
3
0
19 Oct 2024
Mixture Compressor for Mixture-of-Experts LLMs Gains More
Mixture Compressor for Mixture-of-Experts LLMs Gains More
Wei Huang
Yue Liao
Jianhui Liu
Ruifei He
Haoru Tan
Shiming Zhang
Hongsheng Li
Si Liu
Xiaojuan Qi
MoE
36
3
0
08 Oct 2024
Large Language Model Inference Acceleration: A Comprehensive Hardware Perspective
Large Language Model Inference Acceleration: A Comprehensive Hardware Perspective
Jinhao Li
Jiaming Xu
Shan Huang
Yonghua Chen
Wen Li
...
Jiayi Pan
Li Ding
Hao Zhou
Yu Wang
Guohao Dai
52
13
0
06 Oct 2024
The Early Bird Catches the Leak: Unveiling Timing Side Channels in LLM Serving Systems
The Early Bird Catches the Leak: Unveiling Timing Side Channels in LLM Serving Systems
Linke Song
Zixuan Pang
Wenhao Wang
Zihao Wang
XiaoFeng Wang
Hongbo Chen
Wei Song
Yier Jin
Dan Meng
Rui Hou
38
6
0
30 Sep 2024
Hyper-Compression: Model Compression via Hyperfunction
Hyper-Compression: Model Compression via Hyperfunction
Fenglei Fan
Juntong Fan
Dayang Wang
Jingbo Zhang
Zelin Dong
Shijun Zhang
Ge Wang
Tieyong Zeng
18
0
0
01 Sep 2024
MoDeGPT: Modular Decomposition for Large Language Model Compression
MoDeGPT: Modular Decomposition for Large Language Model Compression
Chi-Heng Lin
Shangqian Gao
James Seale Smith
Abhishek Patel
Shikhar Tuli
Yilin Shen
Hongxia Jin
Yen-Chang Hsu
68
6
0
19 Aug 2024
PEARL: Parallel Speculative Decoding with Adaptive Draft Length
PEARL: Parallel Speculative Decoding with Adaptive Draft Length
Tianyu Liu
Yun Li
Qitan Lv
Kai Liu
Jianchen Zhu
Winston Hu
X. Sun
39
10
0
13 Aug 2024
Mobile Edge Intelligence for Large Language Models: A Contemporary Survey
Mobile Edge Intelligence for Large Language Models: A Contemporary Survey
Guanqiao Qu
Qiyuan Chen
Wei Wei
Zheng Lin
Xianhao Chen
Kaibin Huang
31
37
0
09 Jul 2024
Composable Interventions for Language Models
Composable Interventions for Language Models
Arinbjorn Kolbeinsson
Kyle O'Brien
Tianjin Huang
Shanghua Gao
Shiwei Liu
...
Anurag J. Vaidya
Faisal Mahmood
Marinka Zitnik
Tianlong Chen
Thomas Hartvigsen
KELM
MU
75
5
0
09 Jul 2024
Leveraging Large Language Models for Integrated
  Satellite-Aerial-Terrestrial Networks: Recent Advances and Future Directions
Leveraging Large Language Models for Integrated Satellite-Aerial-Terrestrial Networks: Recent Advances and Future Directions
Shumaila Javaid
R. A. Khalil
Nasir Saeed
Bin He
Mohamed-Slim Alouini
32
8
0
05 Jul 2024
On the Workflows and Smells of Leaderboard Operations (LBOps): An Exploratory Study of Foundation Model Leaderboards
On the Workflows and Smells of Leaderboard Operations (LBOps): An Exploratory Study of Foundation Model Leaderboards
Zhimin Zhao
A. A. Bangash
F. Côgo
Bram Adams
Ahmed E. Hassan
52
0
0
04 Jul 2024
DistilDoc: Knowledge Distillation for Visually-Rich Document Applications
DistilDoc: Knowledge Distillation for Visually-Rich Document Applications
Jordy Van Landeghem
Subhajit Maity
Ayan Banerjee
Matthew Blaschko
Marie-Francine Moens
Josep Lladós
Sanket Biswas
41
2
0
12 Jun 2024
GPTA: Generative Prompt Tuning Assistant for Synergistic Downstream
  Neural Network Enhancement with LLMs
GPTA: Generative Prompt Tuning Assistant for Synergistic Downstream Neural Network Enhancement with LLMs
Xiao Liu
Jiawei Zhang
22
0
0
29 Mar 2024
SVD-LLM: Truncation-aware Singular Value Decomposition for Large Language Model Compression
SVD-LLM: Truncation-aware Singular Value Decomposition for Large Language Model Compression
Xin Wang
Yu Zheng
Zhongwei Wan
Mi Zhang
MQ
55
43
0
12 Mar 2024
OneBit: Towards Extremely Low-bit Large Language Models
OneBit: Towards Extremely Low-bit Large Language Models
Yuzhuang Xu
Xu Han
Zonghan Yang
Shuo Wang
Qingfu Zhu
Zhiyuan Liu
Weidong Liu
Wanxiang Che
MQ
51
36
0
17 Feb 2024
BitDistiller: Unleashing the Potential of Sub-4-Bit LLMs via
  Self-Distillation
BitDistiller: Unleashing the Potential of Sub-4-Bit LLMs via Self-Distillation
Dayou Du
Yijia Zhang
Shijie Cao
Jiaqi Guo
Ting Cao
Xiaowen Chu
Ningyi Xu
MQ
41
28
0
16 Feb 2024
RepQuant: Towards Accurate Post-Training Quantization of Large
  Transformer Models via Scale Reparameterization
RepQuant: Towards Accurate Post-Training Quantization of Large Transformer Models via Scale Reparameterization
Zhikai Li
Xuewen Liu
Jing Zhang
Qingyi Gu
MQ
27
7
0
08 Feb 2024
SliceGPT: Compress Large Language Models by Deleting Rows and Columns
SliceGPT: Compress Large Language Models by Deleting Rows and Columns
Saleh Ashkboos
Maximilian L. Croci
Marcelo Gennari do Nascimento
Torsten Hoefler
James Hensman
VLM
125
143
0
26 Jan 2024
FP6-LLM: Efficiently Serving Large Language Models Through FP6-Centric
  Algorithm-System Co-Design
FP6-LLM: Efficiently Serving Large Language Models Through FP6-Centric Algorithm-System Co-Design
Haojun Xia
Zhen Zheng
Xiaoxia Wu
Shiyang Chen
Zhewei Yao
...
Donglin Zhuang
Zhongzhu Zhou
Olatunji Ruwase
Yuxiong He
S. Song
MQ
25
14
0
25 Jan 2024
Large Language Models and Foundation Models in Smart Agriculture:
  Basics, Opportunities, and Challenges
Large Language Models and Foundation Models in Smart Agriculture: Basics, Opportunities, and Challenges
Jiajia Li
Mingle Xu
Lirong Xiang
Dong Chen
Weichao Zhuang
Xunyuan Yin
Zhao Li
25
3
0
13 Aug 2023
Distilling Step-by-Step! Outperforming Larger Language Models with Less
  Training Data and Smaller Model Sizes
Distilling Step-by-Step! Outperforming Larger Language Models with Less Training Data and Smaller Model Sizes
Lokesh Nagalapatti
Chun-Liang Li
Chih-Kuan Yeh
Hootan Nakhost
Yasuhisa Fujii
Alexander Ratner
Ranjay Krishna
Chen-Yu Lee
Tomas Pfister
ALM
204
498
0
03 May 2023
SCOTT: Self-Consistent Chain-of-Thought Distillation
SCOTT: Self-Consistent Chain-of-Thought Distillation
Jamie Yap
Zhengyang Wang
Zheng Li
K. Lynch
Bing Yin
Xiang Ren
LRM
57
91
0
03 May 2023
LaMini-LM: A Diverse Herd of Distilled Models from Large-Scale
  Instructions
LaMini-LM: A Diverse Herd of Distilled Models from Large-Scale Instructions
Minghao Wu
Abdul Waheed
Chiyu Zhang
Muhammad Abdul-Mageed
Alham Fikri Aji
ALM
124
115
0
27 Apr 2023
ZeroQuant-V2: Exploring Post-training Quantization in LLMs from
  Comprehensive Study to Low Rank Compensation
ZeroQuant-V2: Exploring Post-training Quantization in LLMs from Comprehensive Study to Low Rank Compensation
Z. Yao
Xiaoxia Wu
Cheng-rong Li
Stephen Youn
Yuxiong He
MQ
63
56
0
15 Mar 2023
DepGraph: Towards Any Structural Pruning
DepGraph: Towards Any Structural Pruning
Gongfan Fang
Xinyin Ma
Mingli Song
Michael Bi Mi
Xinchao Wang
GNN
79
245
0
30 Jan 2023
Language Models are Multilingual Chain-of-Thought Reasoners
Language Models are Multilingual Chain-of-Thought Reasoners
Freda Shi
Mirac Suzgun
Markus Freitag
Xuezhi Wang
Suraj Srivats
...
Yi Tay
Sebastian Ruder
Denny Zhou
Dipanjan Das
Jason W. Wei
ReLM
LRM
165
320
0
06 Oct 2022
Self-Consistency Improves Chain of Thought Reasoning in Language Models
Self-Consistency Improves Chain of Thought Reasoning in Language Models
Xuezhi Wang
Jason W. Wei
Dale Schuurmans
Quoc Le
Ed H. Chi
Sharan Narang
Aakanksha Chowdhery
Denny Zhou
ReLM
BDL
LRM
AI4CE
297
3,163
0
21 Mar 2022
Training language models to follow instructions with human feedback
Training language models to follow instructions with human feedback
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
...
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
OSLM
ALM
301
11,730
0
04 Mar 2022
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
Jason W. Wei
Xuezhi Wang
Dale Schuurmans
Maarten Bosma
Brian Ichter
F. Xia
Ed H. Chi
Quoc Le
Denny Zhou
LM&Ro
LRM
AI4CE
ReLM
315
8,261
0
28 Jan 2022
CodeT5: Identifier-aware Unified Pre-trained Encoder-Decoder Models for
  Code Understanding and Generation
CodeT5: Identifier-aware Unified Pre-trained Encoder-Decoder Models for Code Understanding and Generation
Yue Wang
Weishi Wang
Shafiq R. Joty
S. Hoi
204
1,451
0
02 Sep 2021
Did Aristotle Use a Laptop? A Question Answering Benchmark with Implicit
  Reasoning Strategies
Did Aristotle Use a Laptop? A Question Answering Benchmark with Implicit Reasoning Strategies
Mor Geva
Daniel Khashabi
Elad Segal
Tushar Khot
Dan Roth
Jonathan Berant
RALM
245
671
0
06 Jan 2021
Scaling Laws for Neural Language Models
Scaling Laws for Neural Language Models
Jared Kaplan
Sam McCandlish
T. Henighan
Tom B. Brown
B. Chess
R. Child
Scott Gray
Alec Radford
Jeff Wu
Dario Amodei
220
4,424
0
23 Jan 2020
Language Models as Knowledge Bases?
Language Models as Knowledge Bases?
Fabio Petroni
Tim Rocktaschel
Patrick Lewis
A. Bakhtin
Yuxiang Wu
Alexander H. Miller
Sebastian Riedel
KELM
AI4MH
396
2,576
0
03 Sep 2019
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language
  Understanding
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding
Alex Jinpeng Wang
Amanpreet Singh
Julian Michael
Felix Hill
Omer Levy
Samuel R. Bowman
ELM
294
6,927
0
20 Apr 2018
Neural Architecture Search with Reinforcement Learning
Neural Architecture Search with Reinforcement Learning
Barret Zoph
Quoc V. Le
264
5,290
0
05 Nov 2016
1