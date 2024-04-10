OpenAI has reportedly taken more than a million hours of YouTube videos to train its GPT-4 model, the underlying technology powering ChatGPT.

It was only a few days ago that YouTube's CEO put out a warning directed at OpenAI reminding the company that using any data acquired from its video platform will be a violation of its terms of use.

2

VIEW GALLERY - 2 IMAGES

Now, reports are surfacing from The New York Times that OpenAI has trained its most advanced AI model, GPT-4, with more than a million hours of transcribed YouTube videos, according to sources that spoke to the newspaper and told it audio and video transcripts were fed into the company's latest AI model. Moreover, these sources also said that Google, the owner of YouTube, has also used audio and video transcripts to train its AI models, both of which are clear violations of YouTube's terms of use.

A spokesperson for Google, Matt Bryant, told the NYT that any "unauthorized scraping or downloading of YouTube content" is prohibited. It should be noted that the NYT has filed a lawsuit against OpenAI and Microsoft for copyright infringement, alleging the company took the newspaper's content without permission.

The crux of this issue is multi-faceted, as OpenAI has been strangely restrained against informing the public on where it acquired the data to train its impressive AI models. Another problem is the legalities, or lack of copyright infringement, as fair use comes into play, which has famously been a grey area in US laws.

One thing is for certain is AI companies will only face more copyright lawsuits when information is leaked about how their AI models are trained, as the massive amounts data used to train these impressive models can't be 100% liscened.