444

PrivCode: When Code Generation Meets Differential Privacy

Zheng Liu
Chen Gong
Terry Yue Zhuo
Kecen Li
Weichen Yu
Matt Fredrikson
Tianhao Wang
Main:12 Pages
9 Figures
Bibliography:3 Pages
15 Tables
Appendix:6 Pages
Abstract

Large language models (LLMs) have presented outstanding performance in code generation and completion. However, fine-tuning these models on private datasets can raise privacy and proprietary concerns, such as the leakage of sensitive personal information. Differentially private (DP) code generation provides theoretical guarantees for protecting sensitive code by generating synthetic datasets that preserve statistical properties while reducing privacy leakage concerns. However, DP code generation faces significant challenges due to the strict syntactic dependencies and the privacy-utility trade-off.

View on arXiv
Comments on this paper