← Paul McCartney Warns of AI's Threat to Creators The Dangers of AI Video Generation →

The Risks of AI Confidence Scores

Dec 12, 2024 | AI Devices

The adoption of artificial intelligence (AI) in healthcare is rapidly advancing, presenting both opportunities and challenges. While leaders in healthcare often express optimism regarding AI innovation, many information technology professionals harbor concerns about the trustworthiness of these tools. Given the critical nature of healthcare, there is an urgent need to determine the reliability of AI-generated outputs.

Understanding AI Confidence Scores

Recently, there has been a push from various groups to implement confidence scores as a measure of AI’s reliability in a medical context. These scores, often derived from statistical models rather than validated probabilities, can be misleading. This is particularly critical in healthcare, where large language models (LLMs) may produce scores that do not accurately reflect true reliability, fostering an unfounded sense of certainty among practitioners.

As a professional in healthcare technology with a vested interest in AI developments, I contend that trusting confidence scores can lead to significant risks. In the following sections, I will describe these risks and propose more effective alternatives to enable AI usage that safeguards organizational integrity.

The Mechanics of Confidence Scores

Confidence scores serve as numeric indicators of an AI system’s certainty regarding its output, such as a medical diagnosis. Understanding their derivation is crucial—confidence scores typically emerge from a statistical confidence interval, calculated based on training data. They often appear in various applications, including dating apps where users see match scores, which can mistakenly create expectations of reliability applicable to critical decisions in healthcare.

When clinicians view generative AI summaries alongside confidence scores in patient records, there is a danger of accepting these technological inputs at face value. Such implicit trust can lead to serious errors when healthcare professionals prioritize machine outputs over their clinical judgment.

Examining Risks Associated with Confidence Scores

Confidence scores are usually presented as percentages indicating the likelihood of accuracy, posing several risks to clinicians who may lack data science training:

Misunderstanding of context: Most AI workflows are built on population-level data, which may not reflect the specific demographics of the providers they serve. As a result, confidence scores can mislead clinicians into making decisions based on generalized data rather than precise, localized insights.
Overreliance on displayed scores: High confidence scores can encourage users to overlook complexities in the data. This can lead to automation bias, where clinicians trust AI outputs too much, potentially overlooking critical symptoms.
Misrepresentation of accuracy: The statistical probabilities reflected by high confidence scores may not hold true for individual patients, generating a false sense of security in the AI’s recommendations.
False security generates errors: By adhering too closely to AI outputs, clinicians risk ignoring alternative diagnoses and the potential implications on patient care and administrative processes.

Alternative Approaches for Enhancing Trust

Rather than relying on flawed confidence scores, I propose three strategies for improving user understanding of AI outputs:

Localize and frequently update AI models: Incorporating localized data on specific health conditions and populations significantly increases relevancy and reliability. Regular updates ensure models reflect current healthcare trends and standards.
Thoughtfully design outputs for users: Tailoring AI displays to match the user’s perspective—whether for clinicians or data scientists—enhances comprehension and effective decision-making. Instead of a confidence score, presenting comparative data can provide greater context.
Support clinical judgment, don’t replace it: Effective AI tools should augment human expertise rather than substitute it. Providing a range of diagnostic options, ranked from most to least likely, allows clinicians to utilize their professional judgment in decision-making.

It is essential for clinicians to utilize technology that supports their expertise and discourages reliance on potentially misleading confidence scores. By integrating AI insights with real-world context, the healthcare sector can harness the benefits of AI responsibly, optimizing workflow efficiency and ultimately ensuring safer patient care.

Photo: John-Kelly, Getty Images

This post is part of the MedCity Influencers program, welcoming diverse perspectives on healthcare innovation. For more, visit MedCity News.

← Paul McCartney Warns of AI's Threat to Creators The Dangers of AI Video Generation →