by Paul Govern
A new study in the Journal of Biomedical Informatics uses machine learning on unlabeled electronic health record (EHR) data to shed light on the emergence of cardiovascular disease (CVD).
The study hinges on automated patient phenotyping (if eye color is a trait, blue eyes are a phenotype) and ample longitudinal data. Juan Zhao, PhD, Wei-Qi Wei, MD, PhD, and colleagues gathered 12,380 de-identified patient records that reached back at least 10 years prior to a CVD diagnosis. An automated scan found some 1,068 distinct patient phenotypes in this dataset.
Aided by a technique called tensor decomposition, unsupervised machine learning revealed the long-term emergence of 14 distinct CVD patient subtypes. Across the six most prevalent subtypes the risk of heart attack was markedly different, indicating the scan had struck meaningful distinctions.
Certain phenotypes that came forth prominently in the scan — urinary infection, vitamin D deficiency, depression — would appear to challenge current understanding of the routes by which CVD emerges.
Zhao and Wei were joined in the study by Vanderbilt University Medical Center researchers from Biomedical Informatics, Clinical Pharmacology, and Allergy, Pulmonary, and Critical Care Medicine. The study was supported in part by the National Institutes of Health (GM115305, HL133786) and the American Heart Association.