42

MolProphecy: Bridging Medicinal Chemists' Knowledge and Molecular Pre-Trained Models via a Multi-Modal Framework

Jianping Zhao
Qiong Zhou
Tian Wang
Yusi Fan
Qian Yang
Li Jiao
Chang Liu
Zhehao Guo
Qi Lu
Fengfeng Zhou
Ruochi Zhang
Main:13 Pages
7 Figures
Bibliography:3 Pages
4 Tables
Abstract

MolProphecy is a human-in-the-loop (HITL) multi-modal framework designed to integrate chemists' domain knowledge into molecular property prediction models. While molecular pre-trained models have enabled significant gains in predictive accuracy, they often fail to capture the tacit, interpretive reasoning central to expert-driven molecular design. To address this, MolProphecy employs ChatGPT as a virtual chemist to simulate expert-level reasoning and decision-making. The generated chemist knowledge is embedded by the large language model (LLM) as a dedicated knowledge representation and then fused with graph-based molecular features through a gated cross-attention mechanism, enabling joint reasoning over human-derived and structural features. Evaluated on four benchmark datasets (FreeSolv, BACE, SIDER, and ClinTox), MolProphecy outperforms state-of-the-art (SOTA) models, achieving a 15.0 percent reduction in RMSE on FreeSolv and a 5.39 percent improvement in AUROC on BACE. Analysis reveals that chemist knowledge and structural features provide complementary contributions, improving both accuracy and interpretability. MolProphecy offers a practical and generalizable approach for collaborative drug discovery, with the flexibility to incorporate real chemist input in place of the current simulated proxy--without the need for model retraining. The implementation is publicly available atthis https URL.

View on arXiv
Comments on this paper