Extracting accurate COVID data from scanned formsApr. 19, 2021, 8:00 AM
by Paul Govern
Clinical teams at Vanderbilt University Medical Center use electronic forms for patient intake, but for various operational reasons, teams devoted to COVID outpatient intake used paper assessment forms.
The forms, each with 141 fields, were scanned into the electronic health record upon completion. While these files served clinical documentation purposes, they might have posed a major obstacle to research, shutting off information that would normally have flowed as datapoints into a COVID research registry.
In Applied Clinical Informatics, Colin White-Dzuro, Daniel Fabbri, PhD, and colleagues report their use of optical mark/character recognition software (OMR/OCR) and a novel user interface (UI) to extract data from the scanned forms. They constructed their UI to display each scanned form and its editable OMR/OCR interpretation side by side, facilitating quick validation by crowdsourced health care workers. While average accuracy of OMR/OCR was 70%, interobserver agreement for validated documents was an estimated 97%.
This March, COVID teams at VUMC reverted to electronic forms for outpatient intake.
Also on the project were Jacob Schultz, Cheng Ye, Joseph Coco, Janet Myers, Claude Shackelford, MD, and Trent Rosenbloom, MD, MPH.