April 19, 2021

Extracting accurate COVID data from scanned forms

A switch to paper forms during the COVID-19 pandemic might have hobbled research, but using software with a novel user interface made it possible to accurately extract information from scanned forms.

Clinical teams at Vanderbilt University Medical Center use electronic forms for patient intake, but for various operational reasons, teams devoted to COVID outpatient intake used paper assessment forms.

The forms, each with 141 fields, were scanned into the electronic health record upon completion. While these files served clinical documentation purposes, themight have posed a major obstacle to researchshutting off information that would normally have flowed as datapoints into a COVID research registry. 

In Applied Clinical Informatics, Colin White-Dzuro, Daniel Fabbri, PhD, and colleagues report their use of optical mark/character recognition software (OMR/OCR) and a novel user interface (UI) to extract data from the scanned forms. They constructed their UI to display each scanned form and its editable OMR/OCR interpretation side by side, facilitating quick validation by crowdsourced health care workersWhile average accuracy of OMR/OCR was 70%, interobserver agreement for validated documents was an estimated 97%. 

This March, COVID teams at VUMC reverted to electronic forms for outpatient intake.

Also on the project were Jacob Schultz, Cheng Ye, Joseph Coco, Janet MyersClaude Shackelford, MD, and Trent Rosenbloom, MD, MPH.