Optimal Scheduling Algorithms for LLM Inference: Theory and PracticeProceedings of the ACM on Measurement and Analysis of Computing Systems (POMACS), 2025 |
Accurate KV Cache Quantization with Outlier Tokens TracingAnnual Meeting of the Association for Computational Linguistics (ACL), 2025 |
Geometric Collaborative Filtering with ConvergenceInternational Conference on Artificial Intelligence and Statistics (AISTATS), 2024 |
SGLang: Efficient Execution of Structured Language Model ProgramsNeural Information Processing Systems (NeurIPS), 2023 |