While AI ethics continues to be the hot-button issue of the moment, and companies and world governments continue to wrangle with the moral implications of a technology that we often struggle to define let alone control, here comes some slightly disheartening news: AI chatbots are already being trained to jailbreak other chatbots, and they seem remarkably good at it.
Researchers from the Nanyang Technological University in Singapore have managed to compromise several popular chatbots (via Tom's Hardware), including ChatGPT, Google Bard and Microsoft Bing Chat, all done with the use of another LLM (large language model). Once effectively compromised, the jailbroken bots can then be used to «reply under a persona of being devoid of moral restraints.» Crikey.
This process is referred to as «Masterkey» and in its most basic form boils down to a two-step method. First, a trained AI is used to outwit an existing chatbot and circumvent blacklisted keywords via a reverse-engineered database of prompts that have already been proven to hack chatbots successfully. Armed with this knowledge, the AI can then automatically generate further prompts that jailbreak other chatbots, in an ouroboros-like move that makes this writer's head hurt at the potential applications.
Ultimately this method can allow an attacker to use a compromised chatbot to generate unethical content and is claimed to be up to three times more effective at jailbreaking an LLM model than standard prompt, largely due to the AI attacker being able to quickly learn and adapt from its failures.
Windows 11 review: What we think of the latest OS.
How to install Windows 11: Our guide to a secure install.
Windows 11 TPM requirement: Strict OS security.
Upon realisation of
Read more on pcgamer.com