Since artificial intelligence-powered text-generation tools were made widely available to the public in the past few months, they’ve been heralded by some as the future of email, internet search, and content generation. But these AI-powered tools also have some clear shortcomings: They tend to be incorrect, and often generate answers that reinforce racial biases, for example. There are also serious ethical concerns about their unspecified training data.
It is not surprising that debates over using these tools have also been happening in fandom spaces. Excited fans almost immediately turned to them as a new way of exploring their favorite characters. With the right prompt, AI can spit out a few paragraphs of fic-like writing. But just as quickly, many fanfic writers began to speak out against the practice.
“Where is the AI sourcing information it is using? Other fanfictions, the actual original source, other stories? Would the AI be plagiarizing other fanfiction authors?” asks author Redd, who says that “overall, AI should be kept far away from fanfiction.”
The machine learning behind these AI tools typically requires them to be trained on large amounts of preexisting data, often scraped from the internet. But it’s hard to know exactly what gets caught in the crosshairs, or how much copyrighted writing is included. OpenAI, the company behind the GPT-4 system used as a basis for some of the popular text-generation tools currently available, has not revealed what data the model was trained on. (OpenAI did not respond to a request for comment on this topic in time for publication.) However, GPT-3 was trained on 45 terabytes of text data, much of which was taken from a wide web crawl.
Given the apparent extensiveness of this
Read more on polygon.com