Discover ANY AI to make more online for less.

select between over 22,900 AI Tool and 17,900 AI News Posts.


Wikipedia offers AI developers a training dataset to maybe get scraper bots off its back
Wikipedia offers AI developers a training dataset to maybe get scraper bots off its back

Wikipedia has been struggling with the impact that AI crawlers — bots that are scraping text and multimedia from the encyclopedia to train generative artificial intelligence models — have been having on its servers, leading to increased costs and slower load times for human users in some cases. Perhaps in an effort to stop the bots from pummeling the public Wikipedia website and soaking up too much bandwidth, the Wikimedia Foundation (which manages Wikipedia's data) is offering AI developers a dataset they can freely use.
The organization has teamed up with Kaggle, a data science platform, to offer up a beta release of a structured dataset in both English and French. According to Google — which owns Kaggle — the dataset is formatted for machine learning to make it more useful for training, development and data science.
Wikimedia Enterprise notes that the dataset includes "abstracts, short descriptions, infobox-style key-value data, image links and clearly segmented article sections." There are no references or other "non-prose elements," such as video clips. The lack of references could make the issue of attribution for information in the dataset somewhat foggy. However, Wikimedia Enterprise (a part of the Wikimedia Foundation that seeks to make Wikipedia data available through APIs) says that the content in the dataset is freely licensed under Creative Commons, the public domain and so on since it's all from Wikipedia.This article originally appeared on Engadget at https://www.engadget.com/ai/wikipedia-offers-ai-developers-a-training-dataset-to-maybe-get-scraper-bots-off-its-back-143255593.html?src=rss

Rating

Innovation

Pricing

Technology

Usability

We have discovered similar tools to what you are looking for. Check out our suggestions for similar AI tools.

venturebeat
Baseten takes on hyperscalers with new AI training platform that lets you o

<p><a href="https://www.baseten.co/"><u>Baseten</u></a>, the AI infrastructure company recently valued at $2.15 billion, is making its most significant product [...]

Match Score: 135.11

Wikipedia cancels plan to test AI summaries after editors skewer the idea
Wikipedia cancels plan to test AI summaries after editors skewer the idea

<p>Wikipedia is backing off a plan to test AI article summaries. Earlier this month, the platform announced plans to trial the feature for about 10 percent of mobile web visitors. To say they we [...]

Match Score: 118.55

venturebeat
Phi-4 proves that a 'data-first' SFT methodology is the new differentiator

<p>AI engineers often chase performance by scaling up LLM parameters and data, but the trend toward smaller, more efficient, and better-focused models has accelerated. </p><p>The &l [...]

Match Score: 112.91

venturebeat
World's largest open-source multimodal dataset delivers 17x training effici

<p>AI models are only as good as the data they&#x27;re trained on. That data generally needs to be labeled, curated and organized before models can learn from it in an effective way.</p&g [...]

Match Score: 103.46

Engadget Podcast: iPhone 16e review and Amazon's AI-powered Alexa+
Engadget Podcast: iPhone 16e review and Amazon's AI-powered Alexa+

<p>The keyword for the <a data-i13n="cpos:1;pos:1" href="https://www.engadget.com/mobile/smartphones/iphone-16e-review-whats-your-acceptable-compromise-020016288.html"> [...]

Match Score: 100.07

venturebeat
New training method boosts AI multimodal reasoning with smaller, smarter da

<p>Researchers at MiroMind AI and several Chinese universities have released <a href="https://arxiv.org/abs/2511.16334"><u>OpenMMReasoner</u></a>, a new trainin [...]

Match Score: 80.06

venturebeat
Meta returns to open source AI with Omnilingual ASR models that can transcr

<p>Meta has just released a new <a href="https://ai.meta.com/blog/omnilingual-asr">multilingual automatic speech recognition (ASR) system</a> supporting 1,600+ languages †[...]

Match Score: 74.12

Wikipedia's owner challenges categorization rules under UK's Online Safety Act
Wikipedia's owner challenges categorization rules under UK's Online Safety

<p>The Wikimedia Foundation, hosts of the free online encyclopedia <a data-i13n="cpos:1;pos:1" href="https://www.engadget.com/ai/wikipedia-offers-ai-developers-a-training-datas [...]

Match Score: 72.77

Wikimedia says AI bots and summaries are hurting Wikipedia's traffic
Wikimedia says AI bots and summaries are hurting Wikipedia's traffic

<p>Wikimedia is sounding the alarm on the impact AI is having on reliable knowledge and information on the internet. In a <a data-i13n="cpos:1;pos:1" href="https://diff.wikimed [...]

Match Score: 71.81