OpenAI reportedly trained its best AI model on a million hours of YouTube data

OpenAI has reportedly taken more than a million hours of YouTube videos to train its GPT-4 model, the underlying technology powering ChatGPT.

VIEW GALLERY - 2

Jak Connor

Tech and Science Editor

Published Apr 10, 2024 4:04 AM CDT
Updated Apr 19, 2024 12:01 AM CDT

1 minute & 15 seconds read time

Voice: Jak ConnorSpeed

0:00 / --:--

It was only a few days ago that YouTube's CEO put out a warning directed at OpenAI reminding the company that using any data acquired from its video platform will be a violation of its terms of use.

OpenAI reportedly trained its best AI model on a million hours of YouTube data 22255

VIEW GALLERY - 2 IMAGES

Now, reports are surfacing from The New York Times that OpenAI has trained its most advanced AI model, GPT-4, with more than a million hours of transcribed YouTube videos, according to sources that spoke to the newspaper and told it audio and video transcripts were fed into the company's latest AI model. Moreover, these sources also said that Google, the owner of YouTube, has also used audio and video transcripts to train its AI models, both of which are clear violations of YouTube's terms of use.

A spokesperson for Google, Matt Bryant, told the NYT that any "unauthorized scraping or downloading of YouTube content" is prohibited. It should be noted that the NYT has filed a lawsuit against OpenAI and Microsoft for copyright infringement, alleging the company took the newspaper's content without permission.

Read more: Square Enix and other game studios issue notice to OpenAI to stop using their content for AI
Read more: Disney and Universal sue Midjourney, says AI firm is a 'bottomless pit of plagiarism'
Read more: Sony is seeking potential copyright damages of up to $4.5 billion

The crux of this issue is multi-faceted, as OpenAI has been strangely restrained against informing the public on where it acquired the data to train its impressive AI models. Another problem is the legalities, or lack of copyright infringement, as fair use comes into play, which has famously been a grey area in US laws.

One thing is for certain is AI companies will only face more copyright lawsuits when information is leaked about how their AI models are trained, as the massive amounts data used to train these impressive models can't be 100% liscened.

OpenAI reportedly trained its best AI model on a million hours of YouTube data

Best Deals: $25 PlayStation Store Gift Card [Digital Code]

Comments

Similar News Stories