A small research group recently examined the performance of 25 AI 'people', using two large language models created by OpenAI, in an online Turing test. None of the AI bots ultimately passed the test, but all the GPT 3.5 ones were so bad that a chatbot from the mid-1960s was nearly twice as successful as passing off as a human. Mostly because real people didn't believe it was really AI.
News of the work was reported by Ars Technica and it's a fascinating story. The Turing test itself was first devised by famed mathematician and computer scientist Alan Turing, in the 1950s. The original version of the test involves having a real person, called an evaluator, talk to two other participants via a text-based discussion. The evaluator knows that one of the respondents is a computer but doesn't know which one.
If the evaluator can't tell which one is a computer or determines that they must both be humans, then the machine can be said to have passed the Turing test.
Cameron Jones and Benjamin Bergen of the University of California San Diego devised a two-player version of the Turing test, where an 'interrogator' asks questions of a 'witness' and then decides whether the witness is a human being or an AI chatbot. A total of 25 large language model (LLM) witnesses were created, based on the GPT-4 and GPT-3.5 models from OpenAI.
To get some baseline results, real people were also included, as was one of the first-ever chatbots created, ELIZA in the mid-1960s (you can try it out yourself here). The AI witnesses were prompted as to the nature of the discussion, along with instructions on how it should respond. These included things like making spelling mistakes, how long it should take to respond, and whether it was a human or an AI
Read more on pcgamer.com