Cloudflare introduces default blocking of AI data scrapers

Cloudflare introduces default blocking of AI data scrapers

The Star Online - Tech·2025-07-02 11:01

SAN FRANCISCO: Cloudflare, a tech company that helps websites secure and manage their internet traffic, said Tuesday that it had rolled out a new permission-based setting that allows customers to automatically block artificial intelligence companies from collecting their digital data, a move that has implications for publishers and the race to build AI.

With Cloudflare’s new setting, websites can block – by default – online bots that scrape their data, requiring the website owner to grant access for a bot to collect the content, the company said. In the past, those whom Cloudflare did not flag as a hacker or malicious actor could get through to a website to gather its information.

“We’re changing the rules of the internet across all of Cloudflare,” said Matthew Prince, the CEO of the company, which provides tools that protect websites from cyberattacks and helps them load content more efficiently. “If you’re a robot, now you have to go on the toll road in order to get the content of all of these publishers.”

Cloudflare is making the change to protect original content on the internet, Prince said. If AI companies freely use data from various websites without permission or payment, people will be discouraged from creating new digital content, he said. The company, which says its network of servers handles about 20% of internet traffic, has seen a sharp increase in AI data crawlers on the web.

Data for AI systems has become an increasingly contentious issue. OpenAI, Anthropic, Google and other companies building AI systems have amassed reams of information from across the internet to train their AI models. High-quality data is particularly prized because it helps AI models become more proficient in generating accurate answers, videos and images.

But website publishers, authors, news organizations and other content creators have accused AI companies of using their material without permission and payment. Last month, Reddit sued Anthropic, saying the startup had unlawfully used the data of its more than 100 million daily users to train its AI systems. In 2023, The New York Times sued OpenAI and its partner, Microsoft, accusing them of copyright infringement of news content related to AI systems. OpenAI and Microsoft have denied those claims.

Some publishers have struck licensing deals with AI companies to receive compensation for their content. In May, the Times agreed to license its editorial content to Amazon for use in the tech giant’s AI platforms. Axel Springer, Condé Nast and News Corp. have also entered into agreements with AI companies to receive revenue for the use of their material.

Mark Howard, the chief operating officer of Time, said he welcomed Cloudflare’s move. Data scraping by AI companies threatens anyone who creates content, he said, adding that news publishers like Time deserved fair compensation for what they published.

Still, what Cloudflare is enabling “is really just the very, very first step in what will be a very long process,” he said. “But you have to start somewhere, and you have to start at some time.”

OpenAI, Anthropic and Google did not respond to requests for comment.

Cloudflare began considering how to help online publishers about 18 months ago, Prince said. For the past few decades, getting people to go to their websites was how publishers and content creators made money, he said. But AI has changed those dynamics, with people increasingly turning to AI tools like ChatGPT instead of a search engine or a primary source article.

Prince said he was “deeply concerned that the incentives for content creation are dead.” Last July, Cloudflare introduced an optional setting to allow website publishers to block AI scrapers if they wanted. That led to the default setting Tuesday.

AI companies that do not pay for content will ultimately lose out on access to it, Prince said.

“I am 100% confident we can block them from accessing the content,” he said. “And if they don’t get access to the content, then their product will be worse.” –  ©2025 The New York Times Company

This article originally appeared in The New York Times.

……

Read full article on The Star Online - Tech

Technology Entertainment Malaysia