The rapid expansion of artificial intelligence (AI) technologies is leading to an increase in instances of harmful responses, including hate speech, copyright infringements, and inappropriate content. Researchers have highlighted the urgent need for improved standards and rigorous testing protocols to mitigate these undesirable behaviors, which have surfaced due to insufficient regulations surrounding AI model development.
According to Javier Rando, an expert in adversarial machine learning, there remains a significant gap in understanding how to consistently guide AI models to behave as intended. After nearly 15 years of research, he states, “The answer… is, no, we don’t know how to do this, and it doesn’t look like we are getting better.” This admission underscores the complexity of managing machine learning models, especially as their deployment becomes more prevalent.
One viable approach to assess risks associated with AI systems is “red teaming, ” a method borrowed from cybersecurity where individuals test and probe systems to uncover potential threats. Shayne Longpre, a researcher leading the Data Provenance Initiative, points out a significant shortfall in the availability of skilled professionals working in red teams. While AI startups often rely on first-party evaluators or contracted second parties for testing, expanding this process to include third-party users, researchers, and ethical hackers could significantly bolster evaluation efforts.
Longpre adds that some flaws detected within AI systems necessitate input from specialized experts, such as lawyers and medical professionals, highlighting the complexity behind determining whether a system’s output is faulty. To address these challenges, he advocates for standardized ‘AI flaw’ reports and mechanisms to disseminate information regarding identified issues within AI models.
Marrying user-centered evaluative practices with governance frameworks can enhance awareness of AI-related risks. Project Moonshot, launched by Singapore’s Infocomm Media Development Authority, illustrates a proactive approach in this direction. The initiative encompasses a large language model evaluation toolkit, developed with industry partners like IBM and DataRobot, which integrates benchmarks, red teaming, and testing baselines.
This framework aims to enable startups to ensure that their models are trustworthy and operationally safe for users. Anup Kumar, head of client engineering for data and AI at IBM Asia Pacific, emphasizes that the evaluation process should be continuous, occurring both prior to and following the deployment of AI models. Project Moonshot also plans to include customizable features tailored to various industry needs and promote multilingual and multicultural assessments.
Pierre Alquier, a professor of statistics at ESSEC Business School, highlights the troubling tendency of tech companies to rush AI model releases without rigorous testing. He compares the AI approval process unfavorably to that of pharmaceuticals and aviation, where product safety is heavily scrutinized before market entry. Alquier argues for the development of specific AI models tailored to targeted tasks, which would streamline the anticipation and regulation of potential misuse.
Rando concurs, noting that the broad capabilities of large language models (LLMs) complicate the task of defining safety and security standards. He cautions against tech companies overstating the robustness of their defenses against misuse, suggesting a need for balanced perceptions of AI model safety solutions.
The experts unanimously agree that the establishment of stricter standards and an emphasis on comprehensive AI testing are paramount to developing safer, more reliable AI technologies that serve users effectively.