Smoking status is an important variable in epidemiological and genetic studies of disease risk. In the past, questionnaires or interviews were used to ascertain tobacco use, but current studies are increasingly using de-identified versions of electronic medical records for genetic research.
William Bush, Ph.D., graduate student Laura Wiley and colleagues evaluated ICD-9 (International Classification of Diseases, Ninth Revision) tobacco use codes as identifiers of smoking status. The researchers compared smoking status determined by a manual review of clinical records to two automated definitions: ICD-9 and natural language processing that extracts smoking status from the narrative text of electronic medical records.
They report in the Journal of the American Medical Informatics Association that ICD-9 tobacco use codes effectively identify smokers in a general clinic population. They also found that transitions between “current” and “former” tobacco use codes correlated with smoking cessation attempts.
The results support using ICD-9 codes for adjusting smoking status in genetic studies that utilize electronic health records.
This research was supported in part by grants from the National Institutes of Health (GM080178, CA141307, HG004798).