A new study in the American Journal of Human Genetics, led by Vanderbilt researchers Josh Denny, M.D., M.S., and Dana Crawford, Ph.D., takes random volumes of human genotypes and matches them with data siphoned from de-identified medical records and sheds new light on the genetic basis of the common disease hypothyroidism.
In a research lab, one thing more satisfying than a new discovery is a new discovery that inaugurates promising new research methodologies. Among this study’s other innovations, it’s the first genome-wide association study (GWAS) that reuses existing genotypes and clinical information from electronic medical records to study a new disease.
“Our premise was, let’s see if we can basically do a ‘no genotyping’ GWAS,” Denny said. “Can we use what’s already on the shelf, pick another disease, and analyze it within those samples?”
A GWAS tests for associations between a given disease and lots of common genetic variants — simple, single-letter variants that come up either heads or tails, each coin toss biased by heredity, each variant having its own historical ratio of heads to tails. Using DNA samples from hundreds of individuals with and without a given disease, a GWAS painstakingly tallies hundreds of thousands of these coin tosses from every region of every chromosome (using a device variously known as a DNA microarray or gene chip).
The eMERGE Network (electronic MEdical Records and GEnomics) was formed to speed this type of discovery. It’s a national consortium of biorepositories, established in 2007, linking DNA samples to de-identified medical records. (Vanderbilt is the coordinating center for eMERGE.)
At the network’s five original sites a total of five GWASes had already been performed on five disparate diseases and conditions. The new study successfully recycles that data, finding four variants on chromosome 9, near a gene that codes for a thyroid transcription factor, to be highly associated with primary hypothyroidism.
With a setup like eMERGE, the art of a GWAS comes in devising computer algorithms that can scan medical records and locate cases and controls, that is, individuals with and without a given disease. This is the first study to demonstrate the portability of a case-and-control selection algorithm across electronic medical records from multiple organizations.
“As time goes on, the number of things you can analyze with genotype sets and electronic medical record populations is limited only by the diseases that send people to their doctors,” Denny said.
A phenome-wide association study, or PheWAS, is a third Vanderbilt innovation used in this study. While a GWAS tests genotypes for associations with a given disease, a PheWAS does the converse, testing lots of clinical phenotypes — that is, lots of different diseases and their controls — for associations with a single genetic variant.
For their PheWAS, the team used an algorithm to probe the medical records of all genotyped individuals in eMERGE for more than 1,000 diseases and syndromes and their controls. Then they tested for statistical associations between each of these phenotypes and the single genetic variant that their GWAS had found to be most associated with primary hypothyroidism. The question was whether this one coin’s ratio of heads to tails would betray association with any other clinical phenotypes besides hypothyroidism.
The PheWAS worked splendidly, returning the same hypothyroidism association as the GWAS, together with several additional associations that help shed light on the genetic basis of disease.
“I think this sets up a paradigm by which we look at GWAS studies with PheWAS, to follow up,” Denny said.