Every year, thousands of students enroll in courses focused on deploying artificial intelligence (AI) models for health care, aimed at assisting doctors in diagnosing diseases and determining suitable treatments. However, Leo Anthony Celi, a senior research scientist at MIT’s Institute for Medical Engineering and Science and an associate professor at Harvard Medical School, emphasizes a crucial missing element in these courses: training future professionals to detect flaws in the datasets used to develop these models. In a new paper, Celi documents the shortcomings in current educational practices and urges course developers to better prepare students to evaluate their data before use.
Celi highlights that any existing problems within the training data inevitably influence the accuracy and reliability of the AI models developed from it. He points to well-documented instances, such as pulse oximeters that tend to overestimate oxygen levels in people of color, stemming from the lack of diversity in clinical trials. This underscores a significant systemic issue: many medical devices and algorithms are optimized solely for healthy young males, rarely considering other demographics like elderly individuals with chronic health conditions.
The existing electronic health record systems further complicate this landscape. Designed primarily for clinical administration rather than as a comprehensive learning tool, these records do not sufficiently represent the diverse populations that AI models will ultimately serve. Although there are initiatives underway to reform these systems, such changes are not imminent. Thus, there is an urgent need for innovative approaches to leverage the flawed data currently available, mitigating biases through improved model development.
Celi notes that since the inception of the AI course at MIT in 2016, there has been a growing recognition of the pitfalls in focusing exclusively on model performance metrics without considering the underlying data quality. An analysis of online courses revealed that only half of those surveyed included any discussion of potential biases, with a mere two encompassing significant discourse on the topic. This gap poses a risk as aspiring AI developers may unwittingly propagate existing biases into their models if they are not adequately educated on the importance of data scrutiny.
Celi advocates for fundamental changes in how AI courses are structured. He suggests that educators provide students with a framework of critical questions to guide their evaluation process, beginning with inquiries about the source of the data and the demographics of the individuals involved in its collection. Understanding these facets is vital, as biases may already be introduced at the data collection stage, leading to skewed model outcomes.
Additionally, the MIT Critical Data consortium has organized datathons, bringing together diverse groups—health care professionals and data scientists—to collaboratively analyze local datasets. Celi emphasizes that fostering critical thinking cannot occur in homogeneous environments; instead, mixed backgrounds and generational diversity enhance the learning experience and enable students to identify biases organically within their work.
Celi reflects on the necessity to acknowledge the imperfections inherent in the data while striving for improvement. Events like datathons serve as platforms to confront the reality of data quality, encouraging participants to discover solutions rather than evade challenges. He emphasizes that recognizing the limitations of existing datasets is an essential step toward developing effective AI models, advocating that students be equipped with an understanding of both the risks and potential these tools present.