LLM-Pilot: Characterize and Optimize Performance of your LLM Inference Services

International Conference for High Performance Computing, Networking, Storage and Analysis (SC), 2024

3 October 2024

Małgorzata Łazuka

Andreea Anghel

Thomas Parnell

ArXiv (abs)PDF HTML

Papers citing "LLM-Pilot: Characterize and Optimize Performance of your LLM Inference Services"

8 / 8 papers shown

Reasoning Language Model Inference Serving Unveiled: An Empirical Study

256

21 Oct 2025

Systematic Characterization of LLM Quantization: A Performance, Energy, and Quality Perspective

Tianyao Shi

Yi Ding

130

22 Aug 2025

The Hitchhikers Guide to Production-ready Trustworthy Foundation Model powered Software (FMware)

Kirill Vasilevski

Benjamin Rombaut

Gopi Krishnan Rajbahadur

...

Kishanthan Thangarajah

Ahmed E. Hassan

Zhen Ming

325

15 May 2025

Unveiling the Landscape of LLM Deployment in the Wild: An Empirical Study

273

05 May 2025

Taming the Titans: A Survey of Efficient LLM Inference Serving

413

28 Apr 2025

ScreenLLM: Stateful Screen Schema for Efficient Action Understanding and PredictionThe Web Conference (WWW), 2025

918

26 Mar 2025

From Cool Demos to Production-Ready FMware: Core Challenges and a Technology Roadmap

Gopi Krishnan Rajbahadur

G. Oliva

Dayi Lin

Ahmed E. Hassan

312

28 Jan 2025

Software Performance Engineering for Foundation Model-Powered Software (FMware)

Haoxiang Zhang

Shi Chang

Arthur Leung

Kishanthan Thangarajah

Boyuan Chen

Hanan Lutfiyya

Ahmed E. Hassan

572

14 Nov 2024