People are known to misinterpret doctors’ use of risk quantifiers like rare, very rare, common, and so on. In medical contexts, artificial intelligence (AI) all too readily builds on and amplifies lay misunderstandings of terms like these, according to a study of large language models (LLMs) reported in JAMA Network Open by researchers at Vanderbilt Health.
When a doctor tells you a side effect is rare, they may be drawing on definitions such as those recommended for drug labeling by the European Commission, in which very rare means affecting up to 1 in 10,000 people, rare up to 1 in 1,000, uncommon up to 1 in 100, common up to 1 in 10, and very common more than 1 in 10. According to the National Institutes of Health, rare diseases are those affecting fewer than 200,000 Americans at any one time, which comes to fewer than 1 per 1,650 population.
Nicholas Jackson, a biomedical informatics graduate student, and Jessica Ancker, PhD, MPH, professor of Biomedical Informatics, posed patient questions to ChatGPT-4o, Gemini 2.0, Grok 2.0, and Claude 3.5 Sonnet. Results were highly variable across the four models. First, patients prefer that doctors use numbers when communicating risk, previous research by Ancker’s team shows, but LLMs often abstained from defining risk terms numerically — the more so when questions were worded anxiously, or medical issues were severe. And when LLMs did define their terms numerically, they drifted far from recommended usage: For example, according to LLMs, rare meant affecting up to 4% of people on average, uncommon up to 12%, and common up to 36% on average.
Jackson and Ancker were joined by Katerina Andreadis, a PhD student at New York University. The research was supported by the National Library of Medicine under award T15LM007450.