For sophisticated AI chatbots to exist, they need to be trained on large swaths of data, but where things get murky is when the big question is posed to the companies behind these AI chatbots - where did you get this data from? And was it obtained legally?
Since their massive rise in popularity, companies behind these AI chatbots have been accused of stealing copyrighted data, which is then used by the AI for training purposes to further increase its sophistication and, ultimately, the price the company charges for access to the AI. Obtaining datasets legally means companies must pay a licensing fee for copyrighted material, and also agree to a bunch of hoops set by the owner of the data. Why go through all that expenditure and stipulations when the dataset can just be pirated in the same way a movie can be illegally downloaded?
Companies such as OpenAI are currently embroiled in copyright lawsuits for these reasons, but OpenAI isn't the only AI company facing copyright lawsuits, as a lawsuit against Meta has recently been leaked online that accuses the Mark Zuckerberg-run company of obtaining 82 terabytes (TB) of books from an illegal source for AI training. The lawsuit states Meta illegally downloaded the contents of the books from "shadow libraries" such as Anna's Archive, Z-Library, and LibGen, with the suit quoting a Meta researcher who was against the use of pirated material.




Highlights include:
- A senior AI research at Meta says, "I don't think we should use pirated material. I really need to draw a line there."
- Another AI researcher says, "using pirated material should be beyond our ethical threshold" ... "SciHub, ResearchGate, LibGen are basically like PirateBay or something like that, they are distributing content that is protectec by copyright and they're infringing it".
- In January 2023, Mark Zuckerberg attends a meeting which is heavily redacted in court documents. However, he says "we need to this move this stuff forward" and "we need to find a way to unblock all of this".
- Fast forward to April, 2023, Meta employees discuss using a VPN to conceal Meta IP address ranges when torrenting data. Meta employees also discuss the need to involve lawyers if something goes astray. The unredacted court records show a Meta employee saying, "torrenting from a corporate laptop doesn't feel right 😂".
Read more: Apple responds to allegations of using YouTube videos to train Apple Intelligence