On July 7 Sarah Silverman, a stand-up comedian—also known for her acting work as the voice of Vanellope in the Wreck-It Ralph movies—joined authors Christopher Golden and Richard Kadrey in twin lawsuits against OpenAI and Meta.
As reported by The Verge earlier this week, the suit concerns Silverman's written work, with all three claiming that both ChatGPT and LLaMA (Meta's own large language model program) had been trained on data harvested from “shadow library” sites such as «Bibliotik, Library Genesis, Z-Library, and others.»
The OpenAI suit offers a trio of exhibits, which demonstrate the model's ability to summarise copyrighted books with very few mistakes. These include The Bedwetter, a memoir by Silverman, Ararat, a horror-thriller by Christopher Golden, and Sandman Slim, a supernatural fantasy noir thriller by Richard Kadrey.
In short—they'd been caught in the program's net at some point, which the suit claims is an infringement of copyright: «Defendants, by and through the use of ChatGPT, benefit commercial and profit richly from the use of Plaintiffs’ and Class members’ copyrighted materials.»
Meanwhile the suit against Meta alleges that those same books, as well as several others, were found in the datasets used to train LLaMA. The complaint mentions ThePile in particular, which was created by a company named EleutherAI.
The suit quotes EleutherAI's own description of its dataset as using Bibliotik, one of several «shadow libraries» the suit condemns: «Bibliotik consists of a mix of fiction and nonfiction books [...] We included Bibliotik because books are invaluable for long-range context modelling research and coherent storytelling.»
The suit then explains: «These shadow libraries have long been of interest
Read more on pcgamer.com