GMLake: Efficient and Transparent GPU Memory Defragmentation for
Large-scale DNN Training with Virtual Memory StitchingInternational Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), 2024 |
Transkimmer: Transformer Learns to Layer-wise SkimAnnual Meeting of the Association for Computational Linguistics (ACL), 2022 |
SQuant: On-the-Fly Data-Free Quantization via Diagonal Hessian
ApproximationInternational Conference on Learning Representations (ICLR), 2022 |
VELTAIR: Towards High-Performance Multi-tenant Deep Learning Services
via Adaptive Compilation and SchedulingInternational Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), 2022 |
Dual-side Sparse Tensor CoreInternational Symposium on Computer Architecture (ISCA), 2021 |