Websites Block Tech Giants from Using their Data to Train LLMs

Recently there’s been a significant shift unfolding. Top websites are starting to guard their content against tech giants like Google and OpenAI. This step changes the longstanding relationship between web publishers and search engines. The shift is prompted by the rise of artificial intelligence (AI) technologies.

Websites protect their content

Traditionally, websites have used a simple yet powerful tool known as `robots.txt` to manage how search engines interact with their content. This arrangement allowed websites to benefit from the traffic directed by search engines. However, advanced AI models have introduced new complexities to this relationship. Companies such as OpenAI and Google have been using vast amounts of online content to train their AI systems. These AIs can now directly answer user queries, reducing the need for users to visit the original websites. They disrupt the flow of traffic from search engines to these sites.

Buy physical gold and silver online

In response, Google has introduced a new protocol called Google-Extended. It enables websites to block the use of their content for training AI models. The protocol was rolled out in September last year and it has seen adoption by around 10% of the top 1,000 websites. This includes high-profile names like The New York Times and CNN.

Comparing adoption and looking ahead

While Google-Extended represents a step toward giving websites control over their content, its adoption rate trails behind other tools such as OpenAI’s GPTBot. The hesitance may stem from worry over visibility in future AI-driven search results. Websites blocking access to their content risk being overlooked by AI models. They will potentially miss out on being included in answers to relevant queries.

The scenario with The New York Times is particularly telling. The publication has engaged in a copyright dispute with OpenAI. Since then, it has taken a firm stance by using Google-Extended to block AI model training access to its content.  

Google’s experimental Search Generative Experience (SGE) hints at a potential shift in how information is curated and presented to users. It highlights AI-generated content over traditional search methods. The decisions made by tech companies and web publishers will shape the digital ecosystem. It will influence how information is accessed and consumed in the AI age.

About the author

Why invest in physical gold and silver?
文 » A