14
6

PrivGenDB: Efficient and privacy-preserving query executions over encrypted SNP-Phenotype database

Abstract

Searchable symmetric encryption (SSE) has been used to protect the confidentiality of genomic data while providing substring search and range queries on a sequence of genomic data, but it has not been studied for protecting single nucleotide polymorphism (SNP)-phenotype data. In this article, we propose a novel model, PrivGenDB, for securely storing and efficiently conducting different queries on genomic data outsourced to an honest-but-curious cloud server. To instantiate PrivGenDB, we use SSE to ensure confidentiality while conducting different types of queries on encrypted genomic data, phenotype and other information of individuals to help analysts/clinicians in their analysis/care. To the best of our knowledge, PrivGenDB construction is the first SSE-based approach ensuring the confidentiality of shared SNP-phenotype data through encryption while making the computation/query process efficient and scalable for biomedical research and care. Furthermore, it supports a variety of query types on genomic data, including count queries, Boolean queries, and k'-out-of-k match queries. Finally, the PrivGenDB model handles the dataset containing both genotype and phenotype, and it also supports storing and managing other metadata like gender and ethnicity privately. Computer evaluations on a dataset with 5,000 records and 1,000 SNPs demonstrate that a count/Boolean query and a k'-out-of-k match query over 40 SNPs take approximately 4.3s and 86.4{\mu}s, respectively, that outperforms the existing schemes.

View on arXiv
Comments on this paper