The increasing reliance on AI tools for image generation and processing has underscored the necessity of robust security measures throughout these processes. Researchers recently highlighted a new attack strategy that exploits AI capabilities for data exfiltration through images. This approach combines the established threat of image scaling attacks with prompt injection, showcasing how malicious actions can be conducted discreetly.
In a recent disclosure by cybersecurity experts from Trail of Bits, details emerged on how prompt injection attacks can be leveraged alongside image scaling in AI tools to execute malicious activities. These activities can range from the seemingly innocuous, such as launching an application, to more severe breaches like data extraction, all occurring without the victims’ awareness.
Image scaling attacks, initially presented by researchers from the Technische Universität Braunschweig in 2020, exploit the image scaling procedures of AI systems. When AI processes images, it often reduces their size to enhance processing speed and efficiency before sending them to the model. This lowering of image size becomes a pathway for a malicious actor to manipulate the image to alter how the AI interprets it.
Trail of Bits researchers demonstrated this vulnerability by embedding a malicious prompt within an image while ensuring that the prompt is invisible at full resolution. However, during the AI’s processing and rescaling of the image, the prompt becomes visible. This change allows the AI model to mistakenly interpret the prompt as part of the instructions it receives, leading the model to perform the malicious action specified within the prompt without the user’s knowledge.
In their experiments, the researchers targeted the Gemini CLI, utilizing the default configuration for the Zapier MCP server. They successfully uploaded an image containing a malicious prompt designed to exfiltrate user data from Google Calendar to a specified email address.
The researchers assert that this attack method, which requires only minor adjustments based on the target AI model, can affect many systems, including:
To allow further exploration of this vulnerability, the researchers have made an open-source tool named “Anamorpher” publicly available on GitHub. This tool, supported by a Python API, enables users to visualize the attacks on multimodal AI systems and is currently in beta.
To combat these security threats, the researchers recommend that limiting image downscaling algorithms may not suffice due to the broad attack potential inherent in the model itself. Instead, they advocate for restricting the upload dimensions and avoiding image downscaling. Additionally, providing an exact preview of the image as perceived by the model could aid in detecting any prompt injections that might escape notice during the upload process.
Moreover, the implementation of strong defense mechanisms is essential to thwart multimodal prompt injection attacks. One suggested strategy includes requiring user confirmation before executing any instructions derived from text embedded within images. Such measures are crucial for safeguarding against the evolving landscape of cybersecurity threats that leverage AI technology.
As this field continues to develop, it’s vital for stakeholders to remain vigilant and proactive in addressing potential vulnerabilities.