ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2403.01241
  4. Cited By
IntactKV: Improving Large Language Model Quantization by Keeping Pivot
  Tokens Intact
v1v2 (latest)

IntactKV: Improving Large Language Model Quantization by Keeping Pivot Tokens Intact

2 March 2024
Ruikang Liu
Haoli Bai
Haokun Lin
Yuening Li
Han Gao
Zheng-Jun Xu
Lu Hou
Jun Yao
Chun Yuan
    MQ
ArXiv (abs)PDFHTMLHuggingFace (1 upvotes)Github (44★)

Papers citing "IntactKV: Improving Large Language Model Quantization by Keeping Pivot Tokens Intact"

24 / 24 papers shown
SingleQuant: Efficient Quantization of Large Language Models in a Single Pass
SingleQuant: Efficient Quantization of Large Language Models in a Single Pass
Jinying Xiao
Bin Ji
Shasha Li
Xiaodong Liu
Ma Jun
Ye Zhong
Wei Li
Xuan Xie
Qingbo Wu
Jie Yu
MQ
111
0
0
27 Nov 2025
Quantization Meets dLLMs: A Systematic Study of Post-training Quantization for Diffusion LLMs
Quantization Meets dLLMs: A Systematic Study of Post-training Quantization for Diffusion LLMs
Haokun Lin
Haobo Xu
Yichen Wu
Ziyu Guo
Renrui Zhang
Zhichao Lu
Ying Wei
Gang Qu
Zhenan Sun
DiffMMQ
184
9
0
20 Aug 2025
SurfaceLogicKV: Surface and Logic Attention Behaviors are All You Need for Robust KV Cache Compression
SurfaceLogicKV: Surface and Logic Attention Behaviors are All You Need for Robust KV Cache Compression
Mengjie Li
William J. Song
VLM
89
0
0
14 Aug 2025
XQuant: Breaking the Memory Wall for LLM Inference with KV Cache Rematerialization
XQuant: Breaking the Memory Wall for LLM Inference with KV Cache Rematerialization
Aditya Tomar
Coleman Hooper
M Lee
Haocheng Xi
Rishabh Tiwari
Wonjun Kang
Luca Manolache
Michael W. Mahoney
Kurt Keutzer
A. Gholami
MQ
186
0
0
14 Aug 2025
LRQ-DiT: Log-Rotation Post-Training Quantization of Diffusion Transformers for Image and Video Generation
LRQ-DiT: Log-Rotation Post-Training Quantization of Diffusion Transformers for Image and Video Generation
Lianwei Yang
Haokun Lin
Tianchen Zhao
Yichen Wu
Hongyu Zhu
Ruiqi Xie
Zhenan Sun
Yu Wang
Qingyi Gu
MQ
243
1
0
05 Aug 2025
KVmix: Gradient-Based Layer Importance-Aware Mixed-Precision Quantization for KV Cache
Fei Li
Song Liu
Weiguo Wu
Shiqiang Nie
Jinyu Wang
MQ
95
0
0
18 May 2025
ZeroTuning: Unlocking the Initial Token's Power to Enhance Large Language Models Without Training
ZeroTuning: Unlocking the Initial Token's Power to Enhance Large Language Models Without Training
Feijiang Han
Xiaodong Yu
Jianheng Tang
Delip Rao
Weihua Du
Lyle Ungar
376
6
0
16 May 2025
Quantization Hurts Reasoning? An Empirical Study on Quantized Reasoning Models
Quantization Hurts Reasoning? An Empirical Study on Quantized Reasoning Models
Ruikang Liu
Yuxuan Sun
Manyi Zhang
Haoli Bai
Xianzhi Yu
Tiezheng Yu
C. Yuan
Lu Hou
MQLRM
428
30
0
07 Apr 2025
SQuat: Subspace-orthogonal KV Cache Quantization
SQuat: Subspace-orthogonal KV Cache Quantization
Hao Wang
Ligong Han
Kai Xu
Akash Srivastava
MQ
388
2
0
31 Mar 2025
FlexPrefill: A Context-Aware Sparse Attention Mechanism for Efficient Long-Sequence Inference
FlexPrefill: A Context-Aware Sparse Attention Mechanism for Efficient Long-Sequence InferenceInternational Conference on Learning Representations (ICLR), 2025
Xunhao Lai
Jianqiao Lu
Yao Luo
Yiyuan Ma
Xun Zhou
303
50
0
28 Feb 2025
Binary Neural Networks for Large Language Model: A Survey
Binary Neural Networks for Large Language Model: A Survey
Liangdong Liu
Zhitong Zheng
Cong Wang
TianHuang Su
ZhenYu Yang
MQ
279
2
0
26 Feb 2025
Probe Pruning: Accelerating LLMs through Dynamic Pruning via Model-Probing
Probe Pruning: Accelerating LLMs through Dynamic Pruning via Model-ProbingInternational Conference on Learning Representations (ICLR), 2025
Qi Le
Enmao Diao
Ziyan Wang
Xinran Wang
Jie Ding
Li Yang
Ali Anwar
337
9
0
24 Feb 2025
Do we really have to filter out random noise in pre-training data for language models?
Do we really have to filter out random noise in pre-training data for language models?
Jinghan Ru
Yuxin Xie
Xianwei Zhuang
Yuguo Yin
Zhihui Guo
Zhiming Liu
Qianli Ren
Yuexian Zou
441
9
0
10 Feb 2025
AKVQ-VL: Attention-Aware KV Cache Adaptive 2-Bit Quantization for Vision-Language Models
Zunhai Su
Wang Shen
Linge Li
Zhe Chen
Hanyu Wei
Huangqi Yu
Kehong Yuan
MQ
109
1
0
28 Jan 2025
Deploying Foundation Model Powered Agent Services: A Survey
Deploying Foundation Model Powered Agent Services: A Survey
Wenchao Xu
Jinyu Chen
Peirong Zheng
Xiaoquan Yi
Tianyi Tian
...
Quan Wan
Yining Qi
Yunfeng Fan
Qinliang Su
Xuemin Shen
AI4CE
483
5
0
18 Dec 2024
Squeezed Attention: Accelerating Long Context Length LLM Inference
Squeezed Attention: Accelerating Long Context Length LLM InferenceAnnual Meeting of the Association for Computational Linguistics (ACL), 2024
Coleman Hooper
Sehoon Kim
Hiva Mohammadzadeh
Monishwaran Maheswaran
June Paik
Michael W. Mahoney
Kemal Kurniawan
Amir Gholami
Amir Gholami
608
32
0
14 Nov 2024
Active-Dormant Attention Heads: Mechanistically Demystifying
  Extreme-Token Phenomena in LLMs
Active-Dormant Attention Heads: Mechanistically Demystifying Extreme-Token Phenomena in LLMs
Tianyu Guo
Druv Pai
Yu Bai
Jiantao Jiao
Michael I. Jordan
Song Mei
305
25
0
17 Oct 2024
Harnessing Your DRAM and SSD for Sustainable and Accessible LLM
  Inference with Mixed-Precision and Multi-level Caching
Harnessing Your DRAM and SSD for Sustainable and Accessible LLM Inference with Mixed-Precision and Multi-level Caching
Jie Peng
Zhang Cao
Huaizhi Qu
Zhengyu Zhang
Chang Guo
Yanyong Zhang
Zhichao Cao
Tianlong Chen
304
5
0
17 Oct 2024
AsymKV: Enabling 1-Bit Quantization of KV Cache with Layer-Wise
  Asymmetric Quantization Configurations
AsymKV: Enabling 1-Bit Quantization of KV Cache with Layer-Wise Asymmetric Quantization ConfigurationsInternational Conference on Computational Linguistics (COLING), 2024
Qian Tao
Wenyuan Yu
Jingren Zhou
MQ
188
12
0
17 Oct 2024
MaskControl: Spatio-Temporal Control for Masked Motion Synthesis
MaskControl: Spatio-Temporal Control for Masked Motion Synthesis
Ekkasit Pinyoanuntapong
Muhammad Usama Saleem
Korrawe Karunratanakul
Pu Wang
Hongfei Xue
Chong Chen
Chuan Guo
Junli Cao
J. Ren
Sergey Tulyakov
VGen
488
85
0
14 Oct 2024
FlatQuant: Flatness Matters for LLM Quantization
FlatQuant: Flatness Matters for LLM Quantization
Yuxuan Sun
Ruikang Liu
Haoli Bai
Han Bao
Kang Zhao
...
Lu Hou
Chun Yuan
Xin Jiang
Wen Liu
Jun Yao
MQ
593
29
0
12 Oct 2024
AlignedKV: Reducing Memory Access of KV-Cache with Precision-Aligned
  Quantization
AlignedKV: Reducing Memory Access of KV-Cache with Precision-Aligned Quantization
Yifan Tan
Haoze Wang
Chao Yan
Yangdong Deng
MQ
245
5
0
25 Sep 2024
LOOK-M: Look-Once Optimization in KV Cache for Efficient Multimodal
  Long-Context Inference
LOOK-M: Look-Once Optimization in KV Cache for Efficient Multimodal Long-Context Inference
Zhongwei Wan
Ziang Wu
Che Liu
Jinfa Huang
Zhihong Zhu
Peng Jin
Longyue Wang
Li Yuan
VLM
284
69
0
26 Jun 2024
PTQ4DiT: Post-training Quantization for Diffusion Transformers
PTQ4DiT: Post-training Quantization for Diffusion Transformers
Junyi Wu
Haoxuan Wang
Yuzhang Shang
Mubarak Shah
Yan Yan
MQ
306
42
0
25 May 2024
1
Page 1 of 1