Win Fast or Lose Slow: Balancing Speed and Accuracy in Latency-Sensitive Decisions of LLMs

26 May 2025

Papers citing "Win Fast or Lose Slow: Balancing Speed and Accuracy in Latency-Sensitive Decisions of LLMs"

2 / 2 papers shown

Title
BigCodeBench: Benchmarking Code Generation with Diverse Function Calls and Complex Instructions Terry Yue Zhuo Minh Chien Vu Jenny Chim Han Hu Wenhao Yu ... David Lo Daniel Fried Xiaoning Du H. D. Vries Leandro von Werra 224 193 0 22 Jun 2024
QServe: W4A8KV4 Quantization and System Co-design for Efficient LLM Serving Chengyue Wu Haotian Tang Shang Yang Zhekai Zhang Guangxuan Xiao Chuang Gan Song Han 163 98 0 07 May 2024