Results are in: Artificial intelligence, straight out of the box, is better than your doctor at answering written questions of the sort routinely sent in by patients via patient portals.
In a blinded test at Vanderbilt University Medical Center, four primary care doctors rated actual responses from doctors and responses from various AI programs to selected rephrased patient questions concerning issues such as potential bladder infection, sleep issues, prescription renewal for back pain, flu symptoms, blood in stools, and COVID-19. The research report from Siru Liu, PhD, Adam Wright, PhD, and colleagues appears in the Journal of the American Medical Informatics Association.
The AI programs ChatGPT-3.5 and ChatGPT-4 both handily out-performed doctors across all four judging categories — empathy, accuracy, usefulness, responsiveness. ChatGPT is a so-called large language model, or LLM, optimized for chat, created by Open AI, a Microsoft affiliate based in San Francisco.
The spoken intent behind the study is to develop AI to write first-draft responses that doctors would use to speed their work. According to the study, primary care doctors typically spend 1.5 hours per day processing patient messages.
The researchers also created two AI programs fine-tuned to answer patient questions (built on an open-source LLM). In the blinded test, these fine-tuned programs again prevailed over doctors, though the margins were closer.
The authors write that fine-tuning will be needed to provide more natural and complete AI responses and speed physician adoption.
Others on the study from VUMC include Allison McCoy, PhD, Aileen Wright, MD, MS, Babatunde Carew, MD, Sean Huang, MD, Josh Peterson, MD, MPH, and Bryan Steitz, PhD.
The study was supported by the National Institutes of Health (grants R00LM014097, R01AG062499, R01LM013995).