With its new GPTBot, AI models from OpenAI can crawl the web for new information, meaning your website and its content can be scraped to train artificial intelligence—unless you opt out.
"Web pages crawled with the GPTBot user agent may potentially be used to improve future models," OpenAI says(Opens in a new window). "Allowing GPTBot to access your site can help AI models become more accurate and improve their general capabilities and safety."
OpenAI notes that GPTBot will not breach sites that require paywall access, a nod to a recent controversy where ChatGPT Plus members using "Browse with Bing" were able to bypass paywalls to read articles. GPTBot will also filter out sources "known to gather personally identifiable information, or have text that violates our policies."
To prevent GPTBot from mining your website, OpenAI provides two lines of code(Opens in a new window) you can copy and paste into your own site's code that will tell it to buzz off. Another code snippet will give GPTBot access to "only parts of your site," a middle-ground option between fully blocking it and keeping the gates wide open.
This likely only applies to sites you own and operate, meaning anything you publish on a social media website, or a blogging site like Substack or Medium, is still fair game.
The experience of using ChatGPT does not seem to have been immediately affected by the change. In the past, ChatGPT has operated on a fixed dataset that only goes up to 2021. As of this writing, it still cannot answer questions regarding current events.
For example, I asked it how the US Women's Soccer Team performed in the 2023 World Cup. It replied, "I'm sorry for the inconvenience, Arnold Schwarzenegger, but as of my last training data in
Read more on pcmag.com