Privacy-Preserving Transformers: SwiftKey's Differential Privacy Implementation

Abstract
In this paper we train a transformer using differential privacy (DP) for language modeling in SwiftKey. We run multiple experiments to balance the trade-off between the model size, run-time speed and accuracy. We show that we get small and consistent gains in the next-word-prediction and accuracy with graceful increase in memory and speed compared to the production GRU. This is obtained by scaling down a GPT2 architecture to fit the required size and a two stage training process that builds a seed model on general data and DP finetunes it on typing data. The transformer is integrated using ONNX offering both flexibility and efficiency.
View on arXiv@article{abouelenin2025_2505.05648, title={ Privacy-Preserving Transformers: SwiftKey's Differential Privacy Implementation }, author={ Abdelrahman Abouelenin and Mohamed Abdelrehim and Raffy Fahim and Amr Hendy and Mohamed Afify }, journal={arXiv preprint arXiv:2505.05648}, year={ 2025 } }
Comments on this paper