Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2505.14638
Cited By
Dual Precision Quantization for Efficient and Accurate Deep Neural Networks Inference
20 May 2025
Tomer Gafni
Asaf Karnieli
Yair Hanani
MQ
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Dual Precision Quantization for Efficient and Accurate Deep Neural Networks Inference"
4 / 4 papers shown
Title
SpinQuant: LLM quantization with learned rotations
International Conference on Learning Representations (ICLR), 2024
Zechun Liu
Changsheng Zhao
Igor Fedorov
Bilge Soran
Dhruv Choudhary
Raghuraman Krishnamoorthi
Vikas Chandra
Yuandong Tian
Tijmen Blankevoort
MQ
520
225
0
21 Feb 2025
Q-VLM: Post-training Quantization for Large Vision-Language Models
Neural Information Processing Systems (NeurIPS), 2024
Changyuan Wang
Ziwei Wang
Xiuwei Xu
Yansong Tang
Jie Zhou
Jiwen Lu
MQ
392
13
0
10 Oct 2024
VLMEvalKit: An Open-Source Toolkit for Evaluating Large Multi-Modality Models
Haodong Duan
Xinyu Fang
Junming Yang
Xiangyu Zhao
Lin Chen
...
Yuhang Zang
Pan Zhang
Jiaqi Wang
Dahua Lin
Kai Chen
LM&MA
VLM
696
347
0
16 Jul 2024
QServe: W4A8KV4 Quantization and System Co-design for Efficient LLM Serving
Chengyue Wu
Haotian Tang
Shang Yang
Zhekai Zhang
Guangxuan Xiao
Chuang Gan
Song Han
541
154
0
07 May 2024
1