analysis conducted by Press Gazette, it has been revealed that a significant proportion of the top 100 news websites in the English language employ measures to block AI web crawlers from accessing their content. Out of the 106 sites examined, 45 were found to have no AI crawlers blocked whatsoever, while the remaining sites exhibited varying degrees of restriction.
Insights into AI crawler blocking trends among top news websites
Among the surveyed news websites, more than four in ten allow all AI web crawlers to scrape their content without any hindrance. However, a considerable portion, comprising 61 sites, impose restrictions by blocking at least one AI bot. Notably, 32 sites go a step further by blocking two or more AI crawlers, with some sites even barring up to five.
Leading the list of blocked AI crawlers is GPTBot, the web crawler associated with ChatGPT, developed by OpenAI. A striking 56.6% of the surveyed websites disallow access to GPTBot. Following closely behind is Google-Extended, another frequently blocked crawler utilized by Google’s AI chatbot Gemini (previously named Bard).
Additionally, crawlers such as Claude-Web, Claudebot, anthropic-ai, Cohere-ai, Perplexity-ai, Seekr, and Meltwater face varying degrees of restriction across the surveyed websites.
Notable exclusions and inclusions
While some major publishers opt to block certain AI bots, others choose not to impose any restrictions. For instance, Mirror, Express, Manchester Evening News, Ladbible, Unilad, and publications under the Lebedev-owned Independent and Evening Standard umbrella allow unrestricted access to AI crawlers.
Similarly, Politico, Axel Springer’s subsidiary, permits access to AI crawlers due to a content-sharing agreement with OpenAI.
In a surprising move, the Daily Beast, owned by IAC, refrains from blocking any AI bots despite the company’s chairman advocating for compensation to publishers by AI companies. Conversely, some politically conservative websites, including GB News, Newsmax, Zero Hedge, Breitbart, and Fox News, choose not to block AI crawlers, diverging from other publications under the Murdoch-owned umbrella.
Implications and Future Outlook
The varying approaches adopted by news publishers regarding AI crawler access reflect the ongoing debate surrounding content usage and intellectual property rights in the digital era. While some publishers opt for strict control over their content to safeguard against unauthorized usage and maintain control over distribution, others prioritize accessibility and collaboration with AI companies for content dissemination and innovation.
As the landscape continues to evolve, it remains to be seen how publishers, AI companies, and regulatory bodies will navigate the complex intersection of technology, content ownership, and user privacy.
The decisions made by news publishers regarding AI crawler access not only impact the dissemination of news but also shape the broader conversation surrounding digital content usage and intellectual property rights.