Global Giveaway - Win 1 of 3 ID-Cooling High-Performance Coolers, open worldwide until Dec 9

NVIDIA caught scraping 'human lifetime' of YouTube videos per day to train AI

Leaked emails, internal Slack chats, and documents have revealed that NVIDIA allegedly scraped millions of YouTube videos to train its AI products.

NVIDIA caught scraping 'human lifetime' of YouTube videos per day to train AI
Comment IconFacebook IconX IconReddit Icon
Junior Editor
Published
2 minutes read time

It was only last month we heard about Apple, NVIDIA and many other big name players in the AI race being caught up in an investigative report that found they all used a public data set containing YouTube video transcripts to train their respective AI products, which is a violation of YouTube's terms-of-service (TOS).

NVIDIA caught scraping 'human lifetime' of YouTube videos per day to train AI 636363

YouTube has said in the past that any "unauthorized scraping or downloading of YouTube content" is strictly prohibited, and it's especially prohibited when that data is then used for commercial projects. Last month, a Proof News investigation found NVIDIA, Apple, and other AI companies used an academic data set containing subtitles from more than 170,000 YouTube videos to train AI models, and now NVIDIA has been caught in the spotlight again with a report from 404 Media.

According to the publication that spoke with a former NVIDIA employee about the company's internal processes, employees were instructed to scrape videos from Netflix, YouTube, and other sources to add to the data sets that are being used to an AI model for NVIDIA's Omniverse 3D world generator, self-driving car systems, a "digital human" AI avatar product, and the Cosmos deep learning model.

Additionally, the report states NVIDIA made efforts to hide its tracks from YouTube by running multiple "virtual machines" to avoid detection.

"We are finalizing the v1 data pipeline and securing the necessary computing resources," Ming-Yu Liu, NVIDIA's VP of Research and a leader on the Cosmos project, wrote in a May email, according to 404, "to build a video data factory that can yield a human lifetime visual experience worth of training data per day."

Internal conversations viewed by 404 Media revealed when employees raised concerns about the source of the data and the ethics surrounding how it was acquired, managers assured employees they had clearance to use the content for training from the highest levels of the company.

"This is an executive decision," Liu wrote to a hesitant underling on one such occasion, according to Slack messages reviewed by 404. "We have an umbrella approval for all of the data."

NVIDIA was asked to comment on the report's allegations, and the driving force behind the AI push replied that its AI training practices are "full compliance with the letter and the spirit of copyright law."

Best Deals: $10 -PlayStation Store Gift Card [Digital Code]
Country flag Today 7 days ago 30 days ago
$10 USD -
Buy
* Prices last scanned on 12/1/2024 at 5:33 pm CST - prices may not be accurate, click links above for the latest price. We may earn an affiliate commission from any sales.
NEWS SOURCE:404media.co

Junior Editor

Email IconX IconLinkedIn Icon

Jak joined the TweakTown team in 2017 and has since reviewed 100s of new tech products and kept us informed daily on the latest science, space, and artificial intelligence news. Jak's love for science, space, and technology, and, more specifically, PC gaming, began at 10 years old. It was the day his dad showed him how to play Age of Empires on an old Compaq PC. Ever since that day, Jak fell in love with games and the progression of the technology industry in all its forms.

Related Topics

Newsletter Subscription