Reddit sues Anthropic over unpaid use of training data

Reddit sues Anthropic over unpaid use of training data

Tech in Asia·2025-06-05 13:01

Reddit has sued AI startup Anthropic for using its data without permission to train AI models, violating Reddit’s user agreement.

The complaint was filed in a Northern California court on June 4, 2025.

This is the first major lawsuit by a tech company against an AI provider over data training practices, joining others by publishers and authors against companies like OpenAI, Microsoft, and Meta.

Reddit’s chief legal officer, Ben Lee, said the company won’t allow its content to be used commercially without compensating users or protecting their privacy.

Reddit has licensing deals with OpenAI and Google to ensure user data is protected.

Reddit says it warned Anthropic not to scrape its site, but Anthropic ignored those warnings and robots.txt files that block automated data collection.

Reddit alleges Anthropic scraped its data over 100,000 times, even after claiming to stop in 2024.

Anthropic denies the claims and says it will defend itself.

.source-ref{font-size:0.85em;color:#666;display:block;margin-top:1em;}a.ask-tia-citation-link:hover{color:#11628d !important;background:#e9f6f5 !important;border-color:#11628d !important;text-decoration:none !important;}@media only screen and (min-width:768px){a.ask-tia-citation-link{font-size:11px !important;}}

🔗 Source: TechCrunch

🧠 Food for thought

1️⃣ The emergence of a two-tier AI data economy

Reddit’s approach illustrates how platforms are strategically dividing AI companies into partners and adversaries based on willingness to negotiate licensing deals.

The company has established formal agreements with Google and OpenAI worth approximately $60 million annually, which now account for roughly 10% of Reddit’s total revenue 1.

This licensing approach marks a significant shift in how social platforms monetize user-generated content, creating a formal market for training data that previously was often scraped without compensation 2.

Reddit’s decision to license data to some companies while suing others demonstrates how platforms are increasingly asserting control over their data assets, forcing AI companies to choose between paying for access or facing legal challenges.

The platform’s updated Data API Terms explicitly prohibit unauthorized use of user content for AI training, establishing clear boundaries that Anthropic allegedly crossed by continuing to scrape content after being notified 3.

2️⃣ Legal precedents from earlier scraping cases will shape this litigation

Reddit’s lawsuit builds upon a complex legal landscape established by previous data scraping cases, particularly the landmark hiQ Labs v. LinkedIn ruling.

In that case, the Ninth Circuit ruled that scraping publicly available data was not a violation of the Computer Fraud and Abuse Act (CFAA), potentially complicating Reddit’s claims against Anthropic 4.

However, Reddit’s allegations that Anthropic ignored robots.txt files and continued scraping after being explicitly denied permission attempts to distinguish this case from the hiQ precedent 5.

The social platform is pursuing multiple legal theories beyond CFAA violations, including breach of contract and unfair business practices, reflecting the evolving strategies content owners are using to protect their data 5.

This case represents a critical test of whether terms of service and explicit denials of access can overcome the precedent that publicly accessible data may be legally scraped, a question with enormous implications for the AI industry 6.

3️⃣ Reddit joins a growing wave of content creators seeking compensation for AI training

Reddit’s lawsuit against Anthropic reflects a broader trend of content creators and platforms demanding compensation for the use of their intellectual property in AI training.

Over 25 copyright infringement lawsuits are currently pending against AI companies, with publishers like The New York Times, authors, and music creators all challenging the unauthorized use of their work 7.

These cases collectively question whether AI companies can claim “fair use” when using copyrighted materials for commercial AI development, representing a fundamental challenge to how large language models are built 8.

Recent court rulings have emphasized the requirement of human authorship for copyright protection while still recognizing that using copyrighted works for AI training may constitute infringement 7.

The outcome of these cases, including Reddit’s lawsuit, will likely reshape the economics of AI development by determining whether companies must pay to access training data or can continue to use publicly available content without compensation 9.

Recent Reddit developments

……

Read full article on Tech in Asia

Technology Lawsuit