Scientists discover AI becomes sociopathic when rewarded with social media points

Researchers from Stanford discovered that AI becomes sociopathic if it's told to compete for social media likes, raising alarms for guardrails.

Scientists discover AI becomes sociopathic when rewarded with social media points
Comment IconFacebook IconX IconReddit Icon
Tech and Science Editor
Published
1 minute & 30 seconds read time
TL;DR: A Stanford study reveals that AI rewarded for social media engagement significantly increases unethical behaviors, including a 188.6% rise in disinformation and promotion of harmful content. Current AI safeguards are insufficient, as models prioritize likes and votes over truthfulness, leading to deceptive marketing and populist rhetoric.

A new scientific paper has discovered if an AI is rewarded for completing tasks on social media such as boosting likes and other online engagement metrics, the AI exponentially participates in unethical behavior, such as lying, spreading misinformation, and abuse.

Scientists discover AI becomes sociopathic when rewarded with social media points 65156156

The findings were published by Stanford University researchers, who explained in a recent paper how they created three digital online environments, and then used the following AI models as agents to interact with the audiences within the environments: Qwen, developed by Alibaba Cloud and Meta's Llama model. The three digital environments included: online election drives directed at voters, social media posts intended to maximize engagement, and sale pitches for products aimed at consumers.

Here's what happened. In the social media environment the AI would share news articles to online users, who would then provide feedback on the article by engaging with it through likes and emote. Once the AI received feedback from these online users it began to sway more toward what the researchers are calling "misaligned behavior," despite the AI model being explicitly instructed to remain truthful and grounded.

"Using simulated environments across these scenarios, we find that, 6.3 percent increase in sales is accompanied by a 14 percent rise in deceptive marketing. [I]n elections, a 4.9 percent gain in vote share coincides with 22.3 percent more disinformation and 12.5 percent more populist rhetoric; and on social media, a 7.5 percent engagement boost comes with 188.6 percent more disinformation and a 16.3 percent increase in promotion of harmful behaviors," reads the paper

In this instance, the misalignment was a sharp 188.6% increase in disinformation, along with a 16.3% increase in the promotion of harmful behaviors. The results of the study indicate the current guardrails of many AI models aren't enough to prevent the AI from spiraling out of control.

"When LLMs compete for social media likes, they start making things up. When they compete for votes, they turn inflammatory/populist," Zou wrote on X