Put «Google» and «Reddit» in the same sentence and you're bound to get a cacophony of sighs from those in the online publishing biz. Well, we might now hear more sighs from the average internet user, too, as it looks like Google's the only search engine that can currently scrape Reddit to put new posts in its search results.
404 Media clocked on to this and yesterday pointed out that search engines other than Google, such as Bing and DuckDuckGo, aren't showing any Reddit results from the last week in their search results. This does seem to be the case, and you can test it yourself by going to another search engine like DuckDuckGo, searching for «site:reddit.com» and setting it to only display results from the past week. As of the time of writing, no results come up for such a search on DuckDuckGo, but they do on Google.
This seems to be because of changes to Reddit's robots.txt file. Robots.txt is a file that pretty much every website has which tells bots, such as search engine ones, which pages on the site they're «disallowed» from scraping. In addition to preventing search engines from scraping some pages, this file has been useful for websites looking to prevent data being scraped for AI training by disallowing AI crawlers.
It looks like Reddit, however, has recently changed it to disallow any bot at all from scraping the website. You don't have to take our word for this, either, you can check yourself by visiting https://www.reddit.com/robots.txt. The bottom couple of lines on the page essentially tell any bot that it's not allowed to scrape any of Reddit's pages. And if there's no scraping, there's no displaying in search results. That's how search engines work—to simplify it, they scrape, they rank, and they display when users search for related terms.
But Google's still managing to display new Reddit results in search results, which means it's somehow able to access Reddit's information despite the robots.txt disallow.
If we start to wonder whether the
Read more on pcgamer.com