PrivCode: When Code Generation Meets Differential Privacy

5 December 2025

Zheng Liu

Chen Gong

Terry Yue Zhuo

Kecen Li

Weichen Yu

Matt Fredrikson

Tianhao Wang

SyDa

ArXiv (abs)PDF HTML Github (4★)

Main:12 Pages

9 Figures

Bibliography:3 Pages

15 Tables

Appendix:6 Pages

Abstract

Large language models (LLMs) have presented outstanding performance in code generation and completion. However, fine-tuning these models on private datasets can raise privacy and proprietary concerns, such as the leakage of sensitive personal information. Differentially private (DP) code generation provides theoretical guarantees for protecting sensitive code by generating synthetic datasets that preserve statistical properties while reducing privacy leakage concerns. However, DP code generation faces significant challenges due to the strict syntactic dependencies and the privacy-utility trade-off.

View on arXiv

Comments on this paper