The adoption of artificial intelligence (AI) in healthcare is rapidly advancing, presenting both opportunities and challenges. While leaders in healthcare often express optimism regarding AI innovation, many information technology professionals harbor concerns about the trustworthiness of these tools. Given the critical nature of healthcare, there is an urgent need to determine the reliability of AI-generated outputs.
Recently, there has been a push from various groups to implement confidence scores as a measure of AI’s reliability in a medical context. These scores, often derived from statistical models rather than validated probabilities, can be misleading. This is particularly critical in healthcare, where large language models (LLMs) may produce scores that do not accurately reflect true reliability, fostering an unfounded sense of certainty among practitioners.
As a professional in healthcare technology with a vested interest in AI developments, I contend that trusting confidence scores can lead to significant risks. In the following sections, I will describe these risks and propose more effective alternatives to enable AI usage that safeguards organizational integrity.
Confidence scores serve as numeric indicators of an AI system’s certainty regarding its output, such as a medical diagnosis. Understanding their derivation is crucial—confidence scores typically emerge from a statistical confidence interval, calculated based on training data. They often appear in various applications, including dating apps where users see match scores, which can mistakenly create expectations of reliability applicable to critical decisions in healthcare.
When clinicians view generative AI summaries alongside confidence scores in patient records, there is a danger of accepting these technological inputs at face value. Such implicit trust can lead to serious errors when healthcare professionals prioritize machine outputs over their clinical judgment.
Confidence scores are usually presented as percentages indicating the likelihood of accuracy, posing several risks to clinicians who may lack data science training:
Rather than relying on flawed confidence scores, I propose three strategies for improving user understanding of AI outputs:
It is essential for clinicians to utilize technology that supports their expertise and discourages reliance on potentially misleading confidence scores. By integrating AI insights with real-world context, the healthcare sector can harness the benefits of AI responsibly, optimizing workflow efficiency and ultimately ensuring safer patient care.
Photo: John-Kelly, Getty Images
This post is part of the MedCity Influencers program, welcoming diverse perspectives on healthcare innovation. For more, visit MedCity News.