As chatbots powered by artificial intelligence become more ingrained in our everyday lives, people are increasingly turning to them for help diagnosing medical concerns. In situations where health is at stake, such as inquiries about suspicious rashes or insect bites, the accuracy of these AI-generated answers is paramount.

In response to this urgent need, researchers from Binghamton University, led by Ahmed Abdeen Hamed and Luis M. Rocha, have pioneered a verification method to dramatically improve the reliability of AI in medical inquiries. Their study, supported by a $100,000 grant from New York state’s Empire AI Consortium, highlights the potential of large language models (LLMs) in eliminating false “hallucinations” in AI responses.

The researchers tested Open AI’s ChatGPT, which demonstrated impressive accuracy in identifying disease terms, drug names, and genetic information but also produced a concerning number of inaccuracies. To address this, they developed a new protocol utilizing retrieval-augmented generation (RAG) across seven different LLMs, ensuring that the bots reference an authoritative medical database before responding.

This method involved running over 10,000 experiments where each chatbot was presented with identical plain-language symptoms and returned proposed medical terms accompanied by official identification numbers. The collective result was striking: 76.85% of the responses were corroborated by at least four chatbots, while the remaining 23.15% had support from at least two, effectively eliminating unmatched terms and hallucinations.

Hamed emphasized the protocol’s significance, stating, “The new workflow is incredible,” as it promises enhanced verification capabilities, offering insights into diseases, treatments, and clinical trials from a healthcare perspective.

A particularly appealing aspect of this protocol is its adaptability. Researchers can undertake numerous experiments selecting random LLMs from an extensive pool of open-source alternatives, reinforcing the results through repetition to build confidence in the findings. Rocha noted the importance of the protocol in advancing complex network models of diseases, which is aligned with his research on digital twins for precision medicine. These digital replicas are designed to simulate human reactions in real-time, enabling healthcare providers to optimize treatment plans before engaging in real-world trials.

Illustratively, the protocol could facilitate multi-agent verification regarding adverse drug reactions, analyzing data from clinical trials, scientific literature, and even social media. Hamed and Rocha have already begun exploring its application in building models for ER+ breast cancer.

The collaboration between Hamed and Rocha proved to be pivotal. Hamed expressed gratitude for the support he received, particularly in securing grant funding and steering the research’s trajectory.

Although the focus of their study was on biomedical applications, the implications of this work stretch across various fields, offering a promising solution for reducing hallucinations in legal references, academic citations, and historical records produced by AI.

Hamed characterized the protocol as a key advancement in democratizing knowledge verification, encapsulating the potential of AI technologies. As he transitions to a research associate professor role at the University of Nebraska-Lincoln, he acknowledges Binghamton University’s role in shaping his research journey and expresses hope that their findings will inform a responsible and innovative future for generative AI and LLMs.