PrivCode: When Code Generation Meets Differential Privacy
Zheng Liu
Chen Gong
Terry Yue Zhuo
Kecen Li
Weichen Yu
Matt Fredrikson
Tianhao Wang
- SyDa
Main:12 Pages
9 Figures
Bibliography:3 Pages
15 Tables
Appendix:6 Pages
Abstract
Large language models (LLMs) have presented outstanding performance in code generation and completion. However, fine-tuning these models on private datasets can raise privacy and proprietary concerns, such as the leakage of sensitive personal information. Differentially private (DP) code generation provides theoretical guarantees for protecting sensitive code by generating synthetic datasets that preserve statistical properties while reducing privacy leakage concerns. However, DP code generation faces significant challenges due to the strict syntactic dependencies and the privacy-utility trade-off.
View on arXivComments on this paper
