Privacy-Preserving Transformers: SwiftKey's Differential Privacy Implementation

8 May 2025

Abstract

In this paper we train a transformer using differential privacy (DP) for language modeling in SwiftKey. We run multiple experiments to balance the trade-off between the model size, run-time speed and accuracy. We show that we get small and consistent gains in the next-word-prediction and accuracy with graceful increase in memory and speed compared to the production GRU. This is obtained by scaling down a GPT2 architecture to fit the required size and a two stage training process that builds a seed model on general data and DP finetunes it on typing data. The transformer is integrated using ONNX offering both flexibility and efficiency.

View on arXiv

@article{abouelenin2025_2505.05648,
  title={ Privacy-Preserving Transformers: SwiftKey's Differential Privacy Implementation },
  author={ Abdelrahman Abouelenin and Mohamed Abdelrehim and Raffy Fahim and Amr Hendy and Mohamed Afify },
  journal={arXiv preprint arXiv:2505.05648},
  year={ 2025 }
}

Comments on this paper