AnyAi.fyi - Discover ANY AI to make more online for less.

Publishers are blocking the Internet Archive for fear AI scrapers can use it as a workaround

The Internet Archive has often been a valuable resource for journalists, from it's finding records of deleted tweets or providing academic texts for background research. However, the advent of AI has created a new tension between the parties. A few major publications have begun blocking the nonprofit digital library's access to their content based on concerns that AI companies' bots are using the Internet Archive's collections to indirectly scrape their articles."A lot of these AI businesses are looking for readily available, structured databases of content," Robert Hahn, head of business affairs and licensing for The Guardian, told Nieman Lab. "The Internet Archive’s API would have been an obvious place to plug their own machines into and suck out the IP."The New York Times took a similar step. "We are blocking the Internet Archive's bot from accessing the Times because the Wayback Machine provides unfettered access to Times content — including by AI companies — without authorization," a representative from the newspaper confirmed to Nieman Lab. Subscription-focused publication the Financial Times and social forum Reddit have also made moves to selectively block how the Internet Archive catalogs their material. Many publishers have attempted to sue AI businesses for how they access content used to train large language models. To name a few just from the realm of journalism:The New York Times sued OpenAI and MicrosoftThe Center for Investigative Reporting sued OpenAI and MicrosoftThe Wall Street Journal and New York Post sued PerplexityA group of publishers including The Atlantic, The Guardian and Politico sued CoherePenske Media sued GoogleThe New York Times and the Chicago Tribune sued PerplexityOther media outlets have sought financial deals before offering up their libraries as training material, although those arrangements seem to provide compensation to the publishing companies rather than the writers. And that's not even delving into the copyright and piracy issues also being fought against AI tools by other creative fields, from fiction writers to visual artists to musicians. The whole Nieman Lab story is well worth a read for anyone who has been following any of these creative industries’ responses to artificial intelligence.This article originally appeared on Engadget at https://www.engadget.com/ai/publishers-are-blocking-the-internet-archive-for-fear-ai-scrapers-can-use-it-as-a-workaround-204001754.html?src=rss

Discover Copy

Rating

Innovation

Pricing

Technology

Usability

We have discovered similar tools to what you are looking for. Check out our suggestions for similar AI tools.

engadget

Internet Archive is now an official US government document library

The US Senate has granted the <a data-i13n="cpos:1;pos:1" href="https://archive.org/">Internet Archive</a> federal depository status, making it officially part [...]

More Copy

Match Score: 107.62

Private Internet Access VPN review: Both more and less than a budget VPN

I came into this review thinking of Private Internet Access (PIA) as one of the better VPNs. It's in the Kape Technologies portfolio, along with the top-tier ExpressVPN and the generally [...]

More Copy

Match Score: 105.49

Sony and other music labels settle copyright lawsuit against the Internet A

In 2023, Sony Music Entertainment, Universal Music Group and a handful of other music labels <a data-i13n="cpos:1;pos:1" href="https://www.engadget.com/sony-and-other-music- [...]

More Copy

Match Score: 101.36

Anna's Archive told to pay Spotify and record labels $322 million over

The open-source library and search engine Anna’s Archive has been ordered to pay Spotify and the three of the world’s largest music labels $322 million in damages after it <a data-i13n [...]

More Copy

Match Score: 99.25

Cloudflare experiment will block AI bot scrapers unless they pay a fee

Cloudflare has rolled out a couple of new measures meant to keep AI bot crawlers at bay. To start with, every new domain customer that signs up with the company to manage their website traffi [...]

More Copy

Match Score: 68.17

blogspot

Most Frequently Asked Questions About Affiliate Marketing

<div class="separator" style="clear: both; text-align: center;"><div class=&qu [...]

More Copy

Match Score: 63.12

Threads users still barely click links

Two years in, Threads is starting to look more and more like the most viable challenger to X. It passed 350 million monthly users earlier this year and Mark Zuckerberg has predicted it could [...]

More Copy

Match Score: 61.02

Reddit is restricting its availability to the Internet Archive's Wayba

The Internet Archive's Wayback Machine is the latest victim of Reddit's crackdown on data access. The company has begun to place new restrictions on what the archive site will [...]

More Copy

Match Score: 53.55

thenextweb

News publishers are blocking the Internet Archive’s Wayback Machine to st

<img src="https://media.thenextweb.com/2026/05/News-publisher.avif" width="868" height="488"> The New York Times, CNN, USA Today, The Guardian, [...]

More Copy

Match Score: 52.27