Meta accused of pirating 82 terabytes of books from 'shadow libraries' to train AI

Leaked court documents reveal Meta has been accused of downloading 82 terabytes of data from illegal sources to train its artificial intelligence.

VIEW GALLERY - 5

Jak Connor

Tech and Science Editor

Published Feb 11, 2025 12:01 AM CST

1 minute & 30 seconds read time

TL;DR: Leaked court documents indicate that Meta is accused of illegally downloading 82 terabytes of data to train its artificial intelligence systems.

Voice: Jak ConnorSpeed

0:00 / --:--

For sophisticated AI chatbots to exist, they need to be trained on large swaths of data, but where things get murky is when the big question is posed to the companies behind these AI chatbots - where did you get this data from? And was it obtained legally?

Since their massive rise in popularity, companies behind these AI chatbots have been accused of stealing copyrighted data, which is then used by the AI for training purposes to further increase its sophistication and, ultimately, the price the company charges for access to the AI. Obtaining datasets legally means companies must pay a licensing fee for copyrighted material, and also agree to a bunch of hoops set by the owner of the data. Why go through all that expenditure and stipulations when the dataset can just be pirated in the same way a movie can be illegally downloaded?

Companies such as OpenAI are currently embroiled in copyright lawsuits for these reasons, but OpenAI isn't the only AI company facing copyright lawsuits, as a lawsuit against Meta has recently been leaked online that accuses the Mark Zuckerberg-run company of obtaining 82 terabytes (TB) of books from an illegal source for AI training. The lawsuit states Meta illegally downloaded the contents of the books from "shadow libraries" such as Anna's Archive, Z-Library, and LibGen, with the suit quoting a Meta researcher who was against the use of pirated material.

Meta accused of pirating 82 terabytes of books from 'shadow libraries' to train AI 01

VIEW GALLERY - 5 IMAGES

Meta accused of pirating 82 terabytes of books from 'shadow libraries' to train AI 02

Meta accused of pirating 82 terabytes of books from 'shadow libraries' to train AI 03

Meta accused of pirating 82 terabytes of books from 'shadow libraries' to train AI 04

Highlights include:

- A senior AI research at Meta says, "I don't think we should use pirated material. I really need to draw a line there."
- Another AI researcher says, "using pirated material should be beyond our ethical threshold" ... "SciHub, ResearchGate, LibGen are basically like PirateBay or something like that, they are distributing content that is protectec by copyright and they're infringing it".
- In January 2023, Mark Zuckerberg attends a meeting which is heavily redacted in court documents. However, he says "we need to this move this stuff forward" and "we need to find a way to unblock all of this".
- Fast forward to April, 2023, Meta employees discuss using a VPN to conceal Meta IP address ranges when torrenting data. Meta employees also discuss the need to involve lawyers if something goes astray. The unredacted court records show a Meta employee saying, "torrenting from a corporate laptop doesn't feel right 😂".

Meta accused of pirating 82 terabytes of books from 'shadow libraries' to train AI

Highlights include:

Best Deals: God of War RagnarÜk Launch Edition - PlayStation 5

Comments

Similar News Stories