Giveaway: Win an ASRock Z890 Taichi Lite Motherboard

Lawsuit alleges NVIDIA approved use of pirated books to train AI models

NVIDIA has been accused of contacting Anna's Archive to purchase terabytes of copyrighted books to train its large language models.

Lawsuit alleges NVIDIA approved use of pirated books to train AI models
Comment IconFacebook IconX IconReddit Icon
Tech and Science Editor
Published
1 minute & 30 seconds read time
TL;DR: A lawsuit alleges NVIDIA executives approved partnering with Anna's Archive, a site hosting millions of pirated books and papers, to use its data for training Large Language Models. Internal emails reveal NVIDIA sought access to 500 terabytes of illegally obtained content amid competitive pressures.

A complaint filed in the US District Court claims NVIDIA executives approved contact with Anna's Archive, a website that harbors millions of copyrighted books and academic papers, to discuss a partnership that involves using Anna's Archive as a dataset for training its Large Language Models (LLMs).

Lawsuit alleges NVIDIA approved use of pirated books to train AI models 1516165

The complaint alleges that "competitive pressures drove NVIDIA to piracy," and that internal NVIDIA emails demonstrate a member of the company's data strategy team contacting Anna's Archive about the collaboration. Furthermore, the complaint states that Anna's Archive warned NVIDIA that its treasure trove of data was obtained illegally, and asked how Team Green wanted to proceed.

The lawsuit states that within a week, NVIDIA approved of the collaboration, and in response, Anna's Archive offered NVIDIA approximately 500 terabytes of data. "Desperate for books, NVIDIA contacted Anna's Archive -- the largest and most brazen of the remaining shadow libraries -- about acquiring its millions of pirated materials and 'including Anna's Archive in pre-training data for our LLMs,'" the complaint notes.

Lawsuit alleges NVIDIA approved use of pirated books to train AI models 191165

Furthermore, the complaint states that the 500 terabytes of data included millions of books that are only accessible through the Internet Archive's digital lending system. Notably, the complaint does not explicitly state whether NVIDIA followed through with the transaction of paying for access to the dataset offered by Anna's Archive.

Lawsuit alleges NVIDIA approved use of pirated books to train AI models 11132312

"Because Anna's Archive charged tens of thousands of dollars for 'high-speed access' to its pirated collections [] NVIDIA sought to find out what "high-speed access" to the data would look like," reads the complaint

News Source:torrentfreak.com

Tech and Science Editor

Email IconX IconLinkedIn Icon

Jak joined TweakTown in 2017 and has since reviewed 100s of new tech products and kept us informed daily on the latest science, space, and artificial intelligence news. Jak's love for science, space, and technology, and, more specifically, PC gaming, began at 10 years old. It was the day his dad showed him how to play Age of Empires on an old Compaq PC. Ever since that day, Jak fell in love with games and the progression of the technology industry in all its forms.

Follow TweakTown on Google News
Newsletter Subscription