One of the amazing aspects of AI large language model is the sheer amount of data on which they're trained. However, the process in which this is achieved has been a subject of scrutiny for companies like OpenAI and NVIDIA.

Meta is the latest to join this scandal, following a series of lawsuits that claim these organizations have been using copyrighted data to train their AI model 'Llama.' The recent allegations against Meta, entitled "Kadrey et al. vs. Meta Platforms", come from two novelists: Richard Kadrey and Christopher Golden. Like the previous cases, the lawsuit was filed on the basis that Meta was using copyrighted content without authorization.
Following a recent court order from the Northern California District Court, a range of documents were made public that indicate just that. In a documented conversation between Meta employees, an engineer says: "torrenting from a [Meta-owned] corporate laptop doesn't feel right."
Another conversation within the documents suggested that "MZ" had authorized the use of pirated content - further reinforcing the claims. As well as that, Meta used content from a pirated book database, 'LibGen', to train their AI model.
While it wouldn't be unexpected for copyrighted material to show up in large, open-source data sets: the active accumulation of this material on company property is a different story. While the issue of copyright lawsuits against AI tech companies is only appearing to snowball - we'll keep an eye out for the court's ruling to see what precedent might be established.
Author George RR Martin, and Comedian Sarah Silverman are two notable figures who commenced proceedings against OpenAI.