Discover ANY AI to make more online for less.

select between over 22,900 AI Tool and 17,900 AI News Posts.


It turns out you can train AI models without copyrighted material
It turns out you can train AI models without copyrighted material

AI companies claim their tools couldn't exist without training on copyrighted material. It turns out, they could — it's just really hard. To prove it, AI researchers trained a new model that's less powerful but much more ethical. That's because the LLM's dataset uses only public domain and openly licensed material.
The paper (via The Washington Post) was a collaboration between 14 different institutions. The authors represent universities like MIT, Carnegie Mellon and the University of Toronto. Nonprofits like Vector Institute and the Allen Institute for AI also contributed.
The group built an 8 TB ethically-sourced dataset. Among the data was a set of 130,000 books in the Library of Congress. After inputting the material, they trained a seven-billion-parameter large language model (LLM) on that data. The result? It performed about as well as Meta's similarly sized Llama 2-7B from 2023. The team didn't publish benchmarks comparing its results to today's top models.
Performance comparable to a two-year-old model wasn't the only downside. The process of putting it all together was also a grind. Much of the data couldn't be read by machines, so humans had to sift through it. "We use automated tools, but all of our stuff was manually annotated at the end of the day and checked by people," co-author Stella Biderman told WaPo. "And that's just really hard." Figuring out the legal details also made the process hard. The team had to determine which license applied to each website they scanned.
So, what do you do with a less powerful LLM that's much harder to train? If nothing else, it can serve as a counterpoint.
In 2024, OpenAI told a British parliamentary committee that such a model essentially couldn't exist. The company claimed it would be "impossible to train today's leading AI models without using copyrighted materials." Last year, an Anthropic expert witness added, "LLMs would likely not exist if AI firms were required to license the works in their training datasets."
Of course, this study won't change the trajectory of AI companies. After all, more work to create less powerful tools doesn't jive with their interests. But at least it punctures one of the industry's common arguments. Don't be surprised if you hear about this study again in legal cases and regulation arguments.This article originally appeared on Engadget at https://www.engadget.com/ai/it-turns-out-you-can-train-ai-models-without-copyrighted-material-174016619.html?src=rss

Rating

Innovation

Pricing

Technology

Usability

We have discovered similar tools to what you are looking for. Check out our suggestions for similar AI tools.

OpenAI and Google ask for a government exemption to train their AI models on copyrighted material
OpenAI and Google ask for a government exemption to train their AI models o

<p>OpenAI is calling on the Trump administration to give AI companies an exemption to train their models on copyrighted material. In a <a data-i13n="cpos:1;pos:1" href="https:/ [...]

Match Score: 166.98

Engadget Podcast: iPhone 16e review and Amazon's AI-powered Alexa+
Engadget Podcast: iPhone 16e review and Amazon's AI-powered Alexa+

<p>The keyword for the <a data-i13n="cpos:1;pos:1" href="https://www.engadget.com/mobile/smartphones/iphone-16e-review-whats-your-acceptable-compromise-020016288.html"> [...]

Match Score: 125.38

Google gives Android an animated makeover with Material 3 Expressive
Google gives Android an animated makeover with Material 3 Expressive

<p>Well, <a data-i13n="elm:context_link;elmt:doNotAffiliate;cpos:1;pos:1" class="no-affiliate-link" href="https://9to5google.com/2025/05/05/material-3-expressive-leak [...]

Match Score: 83.20

OpenAI suddenly thinks intellectual property theft is not cool, actually, amid DeepSeek’s rise
OpenAI suddenly thinks intellectual property theft is not cool, actually, a

<p><a data-i13n="cpos:1;pos:1" href="https://www.engadget.com/ai/axios-partners-with-openai-forgetting-the-scorpion-stung-the-frog-144242204.html"><ins>OpenAI< [...]

Match Score: 73.64

Tech that can help you stick to your New Year’s resolutions
Tech that can help you stick to your New Year’s resolutions

<p>Regardless of how 2024 went for you, 2025 is another chance for all of us to make the new year better than the one that came before it. New Year’s resolutions are usually set with the best [...]

Match Score: 68.32

The UK's House of Lords kicks back bill that let AI train on copyrighted content
The UK's House of Lords kicks back bill that let AI train on copyrighted co

<p>The UK's House of Lords just voted to add an amendment to a data bill that mandates that tech companies disclose which copyright-protected works were used to train AI models, <a data-i13n= [...]

Match Score: 66.45

ExpressVPN review 2025: Fast speeds and a low learning curve
ExpressVPN review 2025: Fast speeds and a low learning curve

<p><a href="https://www.engadget.com/vpn-review-expressvpn-2023-gaming-streaming-160052492.html" data-autolinker-wiki-id="ExpressVPN" data-original-link="">Ex [...]

Match Score: 61.89

Amazon and The New York Times enter AI-related licensing agreement
Amazon and The New York Times enter AI-related licensing agreement

<p><em>The New York Times</em> and Amazon have entered into a <a data-i13n="cpos:1;pos:1" href="https://investors.nytco.com/news-and-events/press-releases/#data-ite [...]

Match Score: 59.70

The best laptop you can buy in 2025
The best laptop you can buy in 2025

<p>Laptops are evolving fast, with some new models harnessing AI-powered features that adapt to your usage and improve performance in real time. These AI PCs can optimize battery life, manage po [...]

Match Score: 55.42