Automation could speed high-throughput phenotyping of acute disease, a new study shows.
Electronic health records (EHRs) are a mainstay of observational clinical studies, epidemiological studies and disease surveillance. Teams often begin by training a machine learning (ML) algorithm to find cases and controls in the EHR population. Creating training sets for this supervised ML step is time consuming, involving manual annotation of records. PheNorm, introduced in 2018, is an automated solution for creating phenotyping algorithms, with proven accuracy where certain chronic diseases are concerned.
According to a report in the Journal of the American Medical Informatics Association, PheNorm algorithms can also work well for phenotyping acute disease. Using records from Vanderbilt University Medical Center and Kaiser Permanente Washington (KPWA), Joshua Smith, PhD, David Carrell, PhD, and colleagues used PheNorm to develop algorithms for identification of symptomatic COVID-19 cases and controls from the pandemic’s first 12 months.
Among performance measures, the team used the area under the receiver operating characteristic curve — AUC, a measure of discrimination regardless of cut point — and positive predictive value — PPV, the ratio of true positives to total (true and false) positives. AUC was good, reaching 80% at both institutions, and PPV was better than supervised ML, reaching 90% at VUMC and 85% at KPWA. Models developed at KPWA performed well at VUMC and vice versa.
Smith is assistant professor of Biomedical Informatics at VUMC, Carrell is associate investigator at KPWA. Others on the study from VUMC include Daniel Park, Jill Whitaker, MSN, Michael McLemore, Joshua Osmanski, MS, and Robert Winter. The study was supported by the Food and Drug Administration and the National Institutes of Health (UL1TR002243).