353

DebiasedDTA: Improving the Generalizability of Drug-Target Affinity Prediction Models

Abstract

Motivation: Computational models that accurately predict the binding affinity of an input protein-chemical pair can accelerate drug discovery studies. These models are trained on available protein-chemical interaction datasets, which may contain dataset biases that lead the model to learn dataset-specific patterns, instead of generalizable relationships. As a result, the prediction performance of models drops for previously unseen or novel biomolecules. Here, we present DebiasedDTA, a novel drug-target affinity (DTA) prediction model training framework that addresses dataset biases to improve affinity prediction for novel biomolecules. DebiasedDTA reweights the training samples to mitigate the effect of dataset biases and is applicable to most DTA prediction models. Results: The results show that DebiasedDTA can improve the prediction performance on the interactions between previously unseen molecules. In addition, affinity prediction for previously encountered biomolecules also improves with debiasing. The experiments also show that DebiasedDTA can augment DTA prediction models of different input and model structures and is able to mitigate the effect of various dataset biases. Detailed analysis of the predictions shows that the proposed framework can also help to tackle the problem of insufficient learning from proteins, a problem that is known to be a barrier to achieve generalizable DTA prediction models. Availability and Implementation: The source code, the models, and the datasets for reproduction are freely available for download at https://github.com/boun-tabi/debiaseddta-reproduce, implementation in Python3, and supported for Linux, MacOS and MS Windows. Contact: arzucan.ozgur@boun.edu.tr, elif.ozkirimli@roche.com

View on arXiv
Comments on this paper