Google has unveiled a new feature, Google-Extended, offering website publishers the ability to exclude their data from contributing to the development of Google's AI models. While websites will still remain accessible through Google Search, this tool provides publishers with greater control over the use of their content for AI training purposes. In effect, Google will stop using the data of those publishers who opt out.
This move by Google addresses concerns among web publishers who wish to protect their data from being utilised in AI model training. Google-Extended enables publishers to manage the involvement of their websites in enhancing AI generative APIs like Bard and Vertex AI. Publishers can now exercise precise control over content access on their sites, preserving their data privacy rights, the Verge reported.
Earlier this year, Google confirmed that it was training its AI chatbot, Bard, using publicly available data scraped from the web. This announcement sparked concerns and prompted publishers to seek ways to shield their content from being used for AI training purposes, much like the approach taken by major news outlets such as the New York Times, CNN, Reuters, and Medium.
Unlike other web crawlers, Google's indexing is integral to a website's discoverability in search results. Therefore, completely blocking Google's crawlers could have adverse effects on a website's online presence. To address this challenge, some publishers have resorted to legal measures, such as updating their terms of service to prohibit companies from leveraging their content for AI training.
Google-Extended is made accessible through robots.txt, a file that instructs web crawlers on which parts of a site they can access. As AI
Read more on tech.hindustantimes.com