ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2311.09550
  4. Cited By
A Speed Odyssey for Deployable Quantization of LLMs

A Speed Odyssey for Deployable Quantization of LLMs

16 November 2023
Qingyuan Li
Ran Meng
Yiduo Li
Bo-Wen Zhang
Liang Li
Yifan Lu
Xiangxiang Chu
Yerui Sun
Yuchen Xie
    MQ
ArXivPDFHTML

Papers citing "A Speed Odyssey for Deployable Quantization of LLMs"

3 / 3 papers shown
Title
iServe: An Intent-based Serving System for LLMs
iServe: An Intent-based Serving System for LLMs
Dimitrios Liakopoulos
Tianrui Hu
Prasoon Sinha
N. Yadwadkar
VLM
54
0
0
08 Jan 2025
QUIK: Towards End-to-End 4-Bit Inference on Generative Large Language
  Models
QUIK: Towards End-to-End 4-Bit Inference on Generative Large Language Models
Saleh Ashkboos
Ilia Markov
Elias Frantar
Tingxuan Zhong
Xincheng Wang
Jie Ren
Torsten Hoefler
Dan Alistarh
MQ
SyDa
115
21
0
13 Oct 2023
ZeroQuant-V2: Exploring Post-training Quantization in LLMs from
  Comprehensive Study to Low Rank Compensation
ZeroQuant-V2: Exploring Post-training Quantization in LLMs from Comprehensive Study to Low Rank Compensation
Z. Yao
Xiaoxia Wu
Cheng-rong Li
Stephen Youn
Yuxiong He
MQ
63
56
0
15 Mar 2023
1