After a process of public comment and federal interdepartmental review, the Office for Civil Rights at the Department of Health and Human Services (HHS) recently issued new guidance on the de-identification of health records.
A great deal of biomedical and health services research at some point involves record de-identification, whether it’s prior to the sharing of study data or, in the case of large patient databases, prior to the initial extraction of study data.
Bradley Malin, Ph.D., associate professor of Biomedical Informatics and Computer Science, leads the Health Information Privacy Laboratory at Vanderbilt University Medical Center. His lab develops technology to de-identify health records while also quantifying the privacy protection afforded by the resulting, scientifically useful data sets.
Malin has been the government’s sole outside consultant on the question of de-identification. He helped organize a March 2010 conference in Washington, D.C., where he gathered perspectives from the public and from experts in law, policy, biostatistics and computer science. He then led an effort to draft the new guidance on development and use of de-identification methods. The final document is available on the HHS website.
Health records are increasingly computerized, and the patterns of disease and response to treatment that are hidden within these expanding databases make them a trove for biomedical discovery. With Malin’s assistance, Vanderbilt has created and maintains a de-identified health records database called the Synthetic Derivative, which serves a broad range of human-subject research.
The Privacy Rule of the Health Insurance Portability and Accountability Act (HIPAA), set forth in 2002, acknowledged that health information is privileged information, but it also recognized that such information is increasingly important for research and policy evaluation. To enable the re-use of health information while minimizing privacy intrusions, the rule stated that properly de-identified health information could be shared without patients’ consent and would not be subject to the oversight of HIPAA.
In its simplest form, de-identification requires stripping out dates of medical service, birth dates, geographic locations and other potential identifiers. However, research led by Malin has shown that this approach often undermines the scientific usefulness of a data set. Moreover, his research has shown that, depending on the size and makeup of a given database, stripping away such potential identifiers may not be fully effective in thwarting a determined attacker who might seek to re-identify individuals by combining de-identified health information with information from public databases.
With these considerations in view, the government’s Privacy Rule left the door open for experts to devise and apply novel methods for de-identification that get the job done while posing fewer roadblocks to biomedical discovery.
“When you’re dealing with science and policy questions that involve large databases — millions of records — there needs to be technology that automates the de-identification process without unnecessarily hobbling legitimate inquiry,” Malin said.
The problem has been that the Privacy Rule didn’t provide clear standards or guidance regarding novel methods of de-identification. This vagueness has sown uncertainty among health information managers and their legal counsel.
“The original privacy rule left a lot of issues unresolved and in a gray state. This guidance document is not going to completely address all those issues, but it will provide clarification on certain ambiguities and help reason about what it means to de-identify health information. “However, some issues are going to require either further policymaking or case law,” Malin said.
Faculty and staff who have questions about health information de-identification should contact the Vanderbilt Institute for Clinical and Translational Research, 322-7343.