December 16, 2021

Game theory points to new DNA data privacy solutions

Vanderbilt research is blazing a trail for the application of game theory to genomic and health data reidentification risk.

Information based biomedical discovery, in particular the push toward precision medicine, depends on open-ended analysis of de-identified data from patients and research participants on the largest possible scale. Sharing data while controlling the risk of data reidentification under privacy attack is vital to the enterprise.

Zhiyu Wan, PhD

Game theory indicates that only minimal edits are required to protect DNA data against attacks on anonymity, a health information privacy research team reported Dec. 10 in Science Advances.

Zhiyu Wan, PhD, Bradley Malin, PhD, and colleagues at Vanderbilt University Medical Center have in previous papers blazed a trail for the application of game theory to genomic and health data reidentification risk. Here, they demonstrate a game theoretic method for protecting de-identified genomic data against attacks in which an adversary gathers information from different public sources to triangulate a target’s identity. For purposes of illustration, the paper takes particular aim at a method of attack published in Science in 2013, where researchers used online public data sources to reidentify DNA test results obtained by querying a genetic genealogy company’s database.

Bradley Malin, PhD

In the masking game, as the authors call it, a research subject makes the opening move, sharing de-identified DNA data after masking selected data points. The equations and algorithms set out in the paper have allowed the subject to compute a rational adversary’s best responses for all possible masking strategies. The adversary moves next, deciding whether or not to attack based on observing which data points have been masked, again using equations derived from game theory.

“The goal of this research is to show how data holders in the real world — research teams and institutions, hospitals, government agencies, genetic genealogy companies — can use these methods to greatly improve the de-identification of genomic data entrusted to them by patients, research subjects and customers, shutting down the most likely sorts of attackers while optimizing the data’s usefulness under large-scale sharing for scientific research,” said Malin, professor of Biomedical Informatics, Biostatistics and Computer Science.

Comparing the masking game to other data sharing strategies, the paper examines data privacy and utility under a range of scenarios involving different data sets, attack models and levels of risk aversion. The paper provides a formal examination of real, as well as simulated, scenarios involving prospective monetary payoffs for the game’s players. The subject’s payoff is optimized by suppressing only enough data to make attacks unprofitable, leaving attackers with no reason to participate.

“Many data managers are prone to assume the worst-case scenario, an attacker with unlimited capability and no aversion to financial losses,” said Wan, a research fellow in the Health Information Privacy Laboratory at VUMC, where Malin is the lab’s director. “But that may not happen in the real world, so you would tend to overestimate the risk and not share anything. We developed an approach that gives a better estimate of the risk.”

Others on the study from Vanderbilt include Weiyi Xia, PhD, Yongtai Liu, Myrna Wooders, PhD, Jia Guo, Zhijun Yin, PhD, MS, and Ellen Wright Clayton, MD, JD. They were joined by Yevgeniy Vorobeychik, PhD, of Washington University in St. Louis, and Murat Kantarcioglu, PhD, of the University of Texas at Dallas. The study was supported in part by the National Institutes of Health (HG009034, HG006844, LM009989).