Meta accused of training AI with illegal content, documents reveal engineer 'torrenting'

Meta is the latest company to be caught in copyright AI claims, as the company have been accused of using copyrighted data to train their AI model 'Llama'

VIEW GALLERY - 2

Jak Connor

Tech and Science Editor

Published Jan 16, 2025 10:10 AM CST
Updated Jan 21, 2025 9:19 AM CST

1 minute & 30 seconds read time

TL;DR: AI companies like Meta, OpenAI, and NVIDIA face lawsuits for allegedly using copyrighted data to train models without authorization. Meta is accused of using pirated content, including from 'LibGen,' for its AI model 'Llama.' Notable figures like George RR Martin and Sarah Silverman have also initiated legal actions against AI firms.

Voice: Jak ConnorSpeed

0:00 / --:--

One of the amazing aspects of AI large language model is the sheer amount of data on which they're trained. However, the process in which this is achieved has been a subject of scrutiny for companies like OpenAI and NVIDIA.

Meta accused of training AI with illegal content, documents reveal engineer 'torrenting' 561516651

VIEW GALLERY - 2 IMAGES

Meta is the latest to join this scandal, following a series of lawsuits that claim these organizations have been using copyrighted data to train their AI model 'Llama.' The recent allegations against Meta, entitled "Kadrey et al. vs. Meta Platforms", come from two novelists: Richard Kadrey and Christopher Golden. Like the previous cases, the lawsuit was filed on the basis that Meta was using copyrighted content without authorization.

Following a recent court order from the Northern California District Court, a range of documents were made public that indicate just that. In a documented conversation between Meta employees, an engineer says: "torrenting from a [Meta-owned] corporate laptop doesn't feel right."

Another conversation within the documents suggested that "MZ" had authorized the use of pirated content - further reinforcing the claims. As well as that, Meta used content from a pirated book database, 'LibGen', to train their AI model.

While it wouldn't be unexpected for copyrighted material to show up in large, open-source data sets: the active accumulation of this material on company property is a different story. While the issue of copyright lawsuits against AI tech companies is only appearing to snowball - we'll keep an eye out for the court's ruling to see what precedent might be established.

Author George RR Martin, and Comedian Sarah Silverman are two notable figures who commenced proceedings against OpenAI.