Using ChatGPT can result in a mixed bag of helpful information and nonsensical answers, making it hard to evaluate the chatbot's overall performance. And the companies making generative AI tools, including OpenAI, Google, and Microsoft, are secretive about the data they use and how their AI models truly work.
To learn more about generative AI tools, 10 students and four faculty members at the University of California, Berkeley formed a group called the Large Model Systems Organization (LMSYS Org(Opens in a new window)), within the AI research and computer science departments. LMSYS Org has created an experiment, the "Chatbot Arena," a custom website where anyone can anonymously chat with two models at once.
Once the user has formed an opinion on which chatbot's answers they prefer, they vote for a favorite and only afterward find out which models they were talking to. The site uses the same large language models (LLMs) that power ChatGPT and others and repackages the LLMs in a new interface, since companies such as OpenAI have made them available publicly. The site also contains smaller models created by individuals.
"We started this because we created our own AI model based off Meta's LLaMA model in April, [which we] called Vicuna, and we wanted to train different versions and iterate on it," says Hao Zhang(Opens in a new window), one of the professors at Berkeley leading the effort. "It mostly measures human preference, and its ability to follow instructions and do the task the human wants, which is a very important factor in making a model useful."
The group has steadily add more models to the arena, and since April, around 40,000 people have participated, Zhang says.
We tried the Chatbot Arena, below. Not knowing
Read more on pcmag.com