Meta accused of pirating 82 terabytes of books from 'shadow libraries' to train AI

Leaked court documents reveal Meta has been accused of downloading 82 terabytes of data from illegal sources to train its artificial intelligence.

Meta accused of pirating 82 terabytes of books from 'shadow libraries' to train AI
Comment IconFacebook IconX IconReddit Icon
Tech and Science Editor
Published
2 minutes read time
TL;DR: Leaked court documents indicate that Meta is accused of illegally downloading 82 terabytes of data to train its artificial intelligence systems.

For sophisticated AI chatbots to exist, they need to be trained on large swaths of data, but where things get murky is when the big question is posed to the companies behind these AI chatbots - where did you get this data from? And was it obtained legally?

Since their massive rise in popularity, companies behind these AI chatbots have been accused of stealing copyrighted data, which is then used by the AI for training purposes to further increase its sophistication and, ultimately, the price the company charges for access to the AI. Obtaining datasets legally means companies must pay a licensing fee for copyrighted material, and also agree to a bunch of hoops set by the owner of the data. Why go through all that expenditure and stipulations when the dataset can just be pirated in the same way a movie can be illegally downloaded?

Companies such as OpenAI are currently embroiled in copyright lawsuits for these reasons, but OpenAI isn't the only AI company facing copyright lawsuits, as a lawsuit against Meta has recently been leaked online that accuses the Mark Zuckerberg-run company of obtaining 82 terabytes (TB) of books from an illegal source for AI training. The lawsuit states Meta illegally downloaded the contents of the books from "shadow libraries" such as Anna's Archive, Z-Library, and LibGen, with the suit quoting a Meta researcher who was against the use of pirated material.

Meta accused of pirating 82 terabytes of books from 'shadow libraries' to train AI 01
Meta accused of pirating 82 terabytes of books from 'shadow libraries' to train AI 02
Meta accused of pirating 82 terabytes of books from 'shadow libraries' to train AI 03
Meta accused of pirating 82 terabytes of books from 'shadow libraries' to train AI 04

Highlights include:

- A senior AI research at Meta says, "I don't think we should use pirated material. I really need to draw a line there."

- Another AI researcher says, "using pirated material should be beyond our ethical threshold" ... "SciHub, ResearchGate, LibGen are basically like PirateBay or something like that, they are distributing content that is protectec by copyright and they're infringing it".

- In January 2023, Mark Zuckerberg attends a meeting which is heavily redacted in court documents. However, he says "we need to this move this stuff forward" and "we need to find a way to unblock all of this".

- Fast forward to April, 2023, Meta employees discuss using a VPN to conceal Meta IP address ranges when torrenting data. Meta employees also discuss the need to involve lawyers if something goes astray. The unredacted court records show a Meta employee saying, "torrenting from a corporate laptop doesn't feel right 😂".

Read more: Apple responds to allegations of using YouTube videos to train Apple Intelligence

Photo of the God of War RagnarÜk Launch Edition - PlayStation 5
Best Deals: God of War RagnarÜk Launch Edition - PlayStation 5
Country flag Today 7 days ago 30 days ago
$58.89 USD -
Buy
$49.99 USD -
Buy
$99.97 CAD -
Buy
£58.34 -
Buy
$58.89 USD -
Buy
* Prices last scanned on 3/13/2025 at 9:11 am CDT - prices may not be accurate, click links above for the latest price. We may earn an affiliate commission from any sales.
NEWS SOURCE:bgr.com

Tech and Science Editor

Email IconX IconLinkedIn Icon

Jak joined the TweakTown team in 2017 and has since reviewed 100s of new tech products and kept us informed daily on the latest science, space, and artificial intelligence news. Jak's love for science, space, and technology, and, more specifically, PC gaming, began at 10 years old. It was the day his dad showed him how to play Age of Empires on an old Compaq PC. Ever since that day, Jak fell in love with games and the progression of the technology industry in all its forms.

Related Topics

Newsletter Subscription