Optimal Scheduling Algorithms for LLM Inference: Theory and PracticeProceedings of the ACM on Measurement and Analysis of Computing Systems (POMACS), 2025 |
SpecASR: Accelerating LLM-based Automatic Speech Recognition via Speculative DecodingDesign Automation Conference (DAC), 2025 |