There's always been something evocative and mildly terrifying about the term «computer worm». The image it conjures of a tunnelling, burrowing creature, spreading its way through your machine and feasting on its insides. Well, just to add a slightly sharper dose of existential dread to proceedings, researchers have developed an AI worm, bringing the term «artificial intelligence» along to the party just for good measure.
One particular worm has been developed by researchers Ben Nassi, Stav Cohen and Rob Bitton, and named Morris II as a reference to the notorious Morris computer worm that rampaged its way around the internet back in the heady computing days of 1988 (via Ars Technica). The AI worm was built with the express purpose of targeting generative AI powered applications, and has been demonstrated attacking an AI email assistant to steal data from messages and send out spam. Lovely.
The worm makes use of what's referred to as an «adversarial self-replicating prompt». A regular prompt triggers an AI model to output data, whereas an adversarial prompt triggers the model under attack to output a prompt of its own. These prompts can be in the form of images or text, that, when entered into a generative AI model, triggers it to output the input prompt.
These prompts can then be used to trigger vulnerable AI models to demonstrate malicious activity, like revealing confidential data, generating toxic content, distributing spam or otherwise, and also create outputs that allow the worm to exploit the generative AI ecosystem behind it to infect new «hosts».
The researchers were able to write an email including an adversarial text prompt, using it to poison the database of an AI email assistant. When the email was later retrieved by a connected retrieval augmented generation service—commonly used by LLMs to gather extra data—to be sent to an LLM, it then effectively «jailbreaks» the Gen-AI service, forcing it to replicate inputs to outputs and allowing the exfiltration
Read more on pcgamer.com