Understanding Silent Data Corruption in LLM TrainingAnnual Meeting of the Association for Computational Linguistics (ACL), 2025 |
GCN-ABFT: Low-Cost Online Error Checking for Graph Convolutional
NetworksIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems (TCAD), 2024 |
Anatomy of High-Performance GEMM with Online Fault Tolerance on GPUsInternational Conference on Supercomputing (ICS), 2023 |
Winograd Convolution: A Perspective from Fault ToleranceDesign Automation Conference (DAC), 2022 |
FitAct: Error Resilient Deep Neural Networks via Fine-Grained
Post-Trainable Activation FunctionsDesign, Automation and Test in Europe (DATE), 2021 |