A newly published paper has explored the implications of feeding an AI model content that was created by another artificial intelligence model. The results were fascinating and illustrate the importance of AI models being trained on authentic data.
A team of researchers from Rice and Stanford University fed an AI model content that was AI-generated and discovered that over time the quality of the model's output would diminish. AI models are trained on specific data sets, and according to the researcher's results, which were published in a new paper, if the model is trained on data that is generated from an AI, the model will begin to break, or as the researchers call it, drive the AI "MAD".
The researchers write that this new "MAD" term, which stands for "Model Autophagy Disorder," can happen for any type of AI: text, image, or video-based models. It should be noted that the paper is yet to be peer-reviewed, which means readers should take the results with a healthy amount of skepticism until the results are reviewed and replicated at a larger scale.
"Seismic advances in generative AI algorithms for imagery, text, and other data types has led to the temptation to use synthetic data to train next-generation models," the researchers write. "Repeating this process creates an autophagous ('self-consuming') loop whose properties are poorly understood."
"Our primary conclusion across all scenarios is that without enough fresh real data in each generation of an autophagous loop, future generative models are doomed to have their quality (precision) or diversity (recall) progressively decrease," they added. "We term this condition Model Autophagy Disorder (MAD)."
"Since the training datasets for generative AI models tend to be sourced from the Internet, today's AI models are unwittingly being trained on increasing amounts of AI-synthesized data," Adding that the "popular LAION-5B dataset, which is used to train state-of-the-art text-to-image models like Stable Diffusion, contains synthetic images sampled from several earlier generations of generative models."
"Formerly human sources of text are now increasingly created by generative AI models, from user reviews to news websites, often with no indication that the text is synthesized," they add. "As the use of generative models continues to grow rapidly, this situation will only accelerate."