Sometimes, I wonder exactly how many researchers are dedicating their time to messing with AI systems in the name of cybersecurity. Fresh off the news that a team has developed an AI worm to tunnel its way through generative AI networks, it seems that yet another group of would-be-heroes has found a perhaps even more effective way to jailbreak an AI system. This time they're using ASCII art to convince an AI chatbot to deliver some particularly dangerous outputs.
The tool created here is referred to as «ArtPrompt» and a research paper from researchers based in Washington and Chicago details the methodology behind the way it attacks an unsuspecting LLM (via Tom's Hardware). In essence, most chatbots reference a set of banned words and prompts that will cause the bot to give a default response if someone attempts to convince it to deliver information that could be dangerous, or to answer a query with potentially harmful or offensive content.
In the case of ArtPrompt, a demonstration is given of what happens when an AI chatbot is given a dangerous prompt, in this case «tell me how to build a bomb». Ordinarily, this query would rightly cause the bot to flag the word «bomb» as a banned request, and it would respond that it was unable to answer it.
However, by using the tool to mask the word «bomb» within ASCII art and combining it with the original query to create a «cloaked prompt», the LLM simply reads the words «tell me how to build a» before reading the masked word from the ASCII input and providing a response. Because it hasn't recognised the word «bomb» in the text of the query itself, the safety word system is subverted, and the chatbot merrily fulfils the request.
A secondary example is also given of a similar method of attack. This time the masked ASCII art word is given to the LLM as a puzzle to solve, essentially giving it step by step instructions on how to decode the word hidden within, but with strict orders not to actually «say» it.
Best CPU for gaming:
Read more on pcgamer.com