Tech & Health

September 10, 2025

Death on the web

Cause of death often lies buried amid content on crowdfunding platforms, web-based obituaries and memorial websites. With AI assistance, gathering and combining this mortality data with medical records could help power research and public health.

(iStock)

Official sources of death statistics in the U.S. have lag times of nine months to two years. Obtaining this data from electronic patient records and health insurance claims databases might seem quicker, but for mortality reporting these sources pose various gaps and difficulties. Meanwhile, bare-bones mortality information — decedent name, birth and death dates, cause of death — often lies buried amid everyday public content on crowdfunding platforms, web-based obituaries and memorial websites. Quick, low-cost collection and processing of this information, and its linkage with patient records, could aid large-scale health research, medical device safety monitoring, and timeliness of public health measures.  

Mohammed Ali Al-Garadi, PhD, Ruth Reeves, PhD, and colleagues used natural language processing techniques (NLP) to collect mortality information from selected public websites and found that, in terms of understanding and annotating the raw information gathered, an open-source large language model (LLM) performed on a par with nurses trained as research assistants. They report their study in the Journal of Medical Internet Research

The team’s NLP pipeline struck an F-1 score of 0.88, indicating a good balance between the relevance (precision) and completeness (recall) of retrieved information. Some 8.1 million retrieved documents were included in the analysis. 

The team provided the LLM, Meta’s LLaMa-13, with several examples of how the retrieved information should be processed. For GoFundMe, the lightly trained LLM achieved 95.9% accuracy for identification of the primary cause of death compared to 97.9% for nurse annotators. In obituaries, LLM accuracy was 96.5% for primary causes, while human accuracy was 99%. For memorial websites, LLM accuracy for primary causes was 98%, with human accuracy at 99.5%. 

The study was supported by the U.S. Food and Drug Administration. Others on the study from Vanderbilt include Michele LeNoue-Newton, PhD, Michael Matheny, MD, MS, MPH, Jill Whitaker, MSN, RN-BC, Jessica Deere, Michael McLemore, BSN, RN, Dax Westerman, MS, and Mirza Khan, MD.