PrivMedChat: End-to-End Differentially Private RLHF for Medical Dialogue Systems

3 March 2026

Sudip Bhujel

LM&MA

MedIm

ArXiv (abs)PDF HTML Github

Main:8 Pages

2 Figures

Bibliography:2 Pages

7 Tables

Appendix:3 Pages

Abstract

Large language models are increasingly used for patient-facing medical assistance and clinical decision support, but adapting them to clinical dialogue often requires supervision derived from doctor-patient conversations that may contain sensitive information. Conventional supervised fine-tuning and reinforcement learning from human feedback (RLHF) can amplify memorization risks, enabling empirical membership inference and extraction of rare training-set content. We present PrivMedChat, an end-to-end framework for differentially private RLHF (DP-RLHF) for medical dialogue. Our design enforces differential privacy at every training stage that directly accesses dialogue-derived supervision: (i) Differential Private Stochastic Gradient Descent (DP-SGD) for medical SFT and (ii) DP-SGD for reward model learning from preference pairs. To limit additional privacy expenditure during alignment, we apply DP-SGD to the PPO actor and critic when operating on dialogue-derived prompts, while the reward model remains fixed after DP training.

View on arXiv

Comments on this paper