Papers
Communities
Organizations
Events
Blog
Pricing
Feedback
Contact Sales
Search
Open menu
Home
Papers
2311.05161
Cited By
v1
v2 (latest)
Enhancing Computation Efficiency in Large Language Models through Weight and Activation Quantization
9 November 2023
Jangwhan Lee
Minsoo Kim
Seungcheol Baek
Seok Joong Hwang
Wonyong Sung
Jungwook Choi
MQ
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Enhancing Computation Efficiency in Large Language Models through Weight and Activation Quantization"
5 / 5 papers shown
Title
Oaken: Fast and Efficient LLM Serving with Online-Offline Hybrid KV Cache Quantization
Minsu Kim
Seongmin Hong
RyeoWook Ko
S. Choi
Hunjong Lee
Junsoo Kim
Joo-Young Kim
Jongse Park
157
0
0
24 Mar 2025
DecDEC: A Systems Approach to Advancing Low-Bit LLM Quantization
Y. Park
Jake Hyun
Hojoon Kim
Jae W. Lee
MQ
194
0
0
28 Dec 2024
AMXFP4: Taming Activation Outliers with Asymmetric Microscaling Floating-Point for 4-bit LLM Inference
Janghwan Lee
Jiwoong Park
Jinseok Kim
Yongjik Kim
Jungju Oh
Jinwook Oh
Jungwook Choi
117
3
0
15 Nov 2024
LRQ: Optimizing Post-Training Quantization for Large Language Models by Learning Low-Rank Weight-Scaling Matrices
Jung Hyun Lee
Jeonghoon Kim
J. Yang
S. Kwon
Eunho Yang
Kang Min Yoo
Dongsoo Lee
MQ
204
3
0
16 Jul 2024
A Structure-Aware Framework for Learning Device Placements on Computation Graphs
Shukai Duan
Heng Ping
Nikos Kanakaris
Xiongye Xiao
Panagiotis Kyriakis
...
Guixiang Ma
Mihai Capota
Shahin Nazarian
Theodore L. Willke
Paul Bogdan
156
5
0
23 May 2024
1