An international team of researchers has integrated gene expression and disease association data to better understand the biological mechanisms of complex human diseases.
In a study led by Eric Gamazon, PhD, research instructor in Medicine at Vanderbilt, and Ayellet Segrè, PhD, of Harvard University, the team developed computational approaches to integrate the two types of data and link disease-associated genetic variants to gene expression in a broad collection of human tissues. They also identified hundreds of additional variants in the genome that predispose people to disease.
The findings were reported in the journal Nature Genetics.
Since the introduction of genome-wide association studies (GWAS), investigators around the world have identified thousands of genetic variants associated with a range of complex diseases, such as type 2 diabetes, Alzheimer’s disease and coronary artery disease.
“We’ve amassed all this information about what positions in the genome are predisposing us to disease, but we have not been as good at characterizing the underlying mechanisms,” Gamazon said.
Many of the disease-associated variants are located in “non-coding” regions of the genome — they are not located in genes that code for proteins.
“This has made it enormously challenging to characterize the biological mechanisms responsible for disease risk,” Gamazon said.
Gamazon and other members of the international Genotype-Tissue Expression (GTEx) Consortium have been building a resource database — an atlas of gene regulation across 44 human tissues built from more than 7,000 tissue samples.
“This is the most comprehensive database of gene expression across many tissues, and it is a unique resource for understanding what disease- and trait-associated genetic variants are doing,” Gamazon said.
Traits include quantifiable characteristics such as height and body mass index.
The GTEx project aims to use the database to characterize known disease- and trait-associated variants and to identify novel disease- and trait-associated variants. The current study advances both aims.
The investigators found that about 60 percent of known disease- and trait-associated genetic variants associate with genetic variants that influence gene expression, with more than half of the variants regulating two or more genes. They also found an enrichment for disease associations among the gene expression-regulating variants.
“In contrast to GWAS, which today examine hundreds of thousands of samples, GTEx consists of only hundreds of samples,” Gamazon said. “What we’re finding though is that by ‘zooming in’ on those variants that actually control the expression of genes in relevant tissues, we are enriching for novel, previously unidentified disease-associated genetic variants.
“The GTEx resource is a powerful discovery tool.”
The investigators identified new genes associated with systolic blood pressure and coronary artery disease. They then used two large DNA biobanks with linked electronic health records — the UK Biobank and Vanderbilt’s BioVU — to replicate the findings. They found disease associations with the new genes in the biobank databases.
“Our approach can be used to integrate a resource like GTEx and a DNA biobank linked to an electronic health records database to discover genetic predispositions to disease,” Gamazon said.
In the next phase of research, Gamazon expects the GTEx project to include more samples and tissues.
“Studies such as ours illustrate how ‘big data’ in genomics can be integrated to further our understanding of human disease, and can be used to propose novel therapeutic targets,” Gamazon said.
Other Vanderbilt authors of the study included Nancy Cox, PhD, director of the Vanderbilt Genetics Institute, and Anuar Konkashbaev, MS.
The research was supported in part by grants from the National Institutes of Health (MH101820, MH090937, MH113362, CA157823) and was aided by Gamazon’s fellowship at Clare Hall, University of Cambridge, England.