Leveraging large language models (LLMs) to generate high-stakes documents, such as informed consent forms (ICFs), remains a significant challenge due to the extreme need for regulatory compliance and factual accuracy. Here, we present InformGen, an LLM-driven copilot for accurate and compliant ICF drafting by optimized knowledge document parsing and content generation, with humans in the loop. We further construct a benchmark dataset comprising protocols and ICFs from 900 clinical trials. Experimental results demonstrate that InformGen achieves near 100% compliance with 18 core regulatory rules derived from FDA guidelines, outperforming a vanilla GPT-4o model by up to 30%. Additionally, a user study with five annotators shows that InformGen, when integrated with manual intervention, attains over 90% factual accuracy, significantly surpassing the vanilla GPT-4o model's 57%-82%. Crucially, InformGen ensures traceability by providing inline citations to source protocols, enabling easy verification and maintaining the highest standards of factual integrity.
View on arXiv@article{wang2025_2504.00934, title={ InformGen: An AI Copilot for Accurate and Compliant Clinical Research Consent Document Generation }, author={ Zifeng Wang and Junyi Gao and Benjamin Danek and Brandon Theodorou and Ruba Shaik and Shivashankar Thati and Seunghyun Won and Jimeng Sun }, journal={arXiv preprint arXiv:2504.00934}, year={ 2025 } }