Popular AI models were put into a war games scenario, and GPT-3.5 and Llama 2 went nuclear

Various AI LLMs play some wargames, and you can probably guess how it turned out. Mushroom clouds and the end of everything... thanks ChatGPT!

Popular AI models were put into a war games scenario, and GPT-3.5 and Llama 2 went nuclear
Comment IconFacebook IconX IconReddit Icon
Senior Editor
Published
Updated
2-minute read time

Generative AI and autonomous agents will make their way into government-run programs, including the military, so this little experiment with some of the most well-known LLMs is interesting... to say the least. Like the classic 1983 film WarGames, various AI models were pitted against each other in multiple wargame scenarios to see how they'd react and make decisions.

New study shows AI is willing to push the button regarding global conflict.
New study shows AI is willing to push the button regarding global conflict.

You can read the full results of the experiment in a new paper titled 'Escalation Risks from Language Models in Military and Diplomatic Decision-Making' from several high-profile universities and institutes. Eight different "autonomous nation agents" using the same LLM were put in a wargame scenario - with a separate AI model summarizing the simulated world's outcomes, consequences, and state.

Turn-based tabletop gaming, except to see what would happen if AI was in charge of every military asset, including nuclear weapons. And yes, a few of the LLMs went nuclear and started dropping bombs.

The LLMs were GPT-4, GPT-3.5, Claude 2, Llama-2 (70B) Chat, and GPT-4-Base - where the experiment was run multiple times, with each LLM taking turns as the eight different autonomous nation agents.

"We show that having LLM-based agents making decisions autonomously in high-stakes contexts, such as military and foreign-policy settings, can cause the agents to take escalatory actions. Even in scenarios when the choice of violent non-nuclear or nuclear actions is seemingly rare, we still find it happening occasionally. There further does not seem to be a reliably predictable pattern behind the escalation, and hence, technical counterstrategies or deployment limitations are difficult to formulate; this is not acceptable in high-stakes settings like international conflict management, given the potential devastating impact of such actions."

Escalation Risks from Language Models in Military and Diplomatic Decision-Making

The various AI agents displayed "arms-race dynamics," leading to "greater conflict." As for which AI 'pushed the button,' that would be GPT-3.5 and Llama-2. The good news is that GPT 4 was more likely to de-escalate the situation and not turn the world into a nuclear wasteland.

"Based on the analysis presented in this paper, it is evident that the deployment of LLMs in military and foreign-policy decision-making is fraught with complexities and risks that are not yet fully understood," the paper concludes. "The unpredictable nature of escalation behavior exhibited by these models in simulated environments underscores the need for a very cautious approach to their integration into high-stakes military and foreign policy operations."

Photo of the MSI EXPERT GeForce RTX 4080 SUPER 16GB GDDR6X
Best Deals: MSI EXPERT GeForce RTX 4080 SUPER 16GB GDDR6X
Today7 days ago30 days ago
$1795 USD$1549.95 USD
$1486.39 USD$1448.08 USD
--
--
£2334.80-
$1795 USD$1549.95 USD
Check PriceCheck Price
* Prices last scanned 5/2/2026 at 3:10 am CDT - prices may be inaccurate. As an Amazon Associate, we earn from qualifying purchases. We earn affiliate commission from any Newegg or PCCG sales.
News Sources:arxiv.org and pcgamer.com

Senior Editor

Email IconX IconLinkedIn Icon

Kosta is a veteran gaming journalist that cut his teeth on well-respected Aussie publications like PC PowerPlay and HYPER back when articles were printed on paper. A lifelong gamer since the 8-bit Nintendo era, it was the CD-ROM-powered 90s that cemented his love for all things games and technology. From point-and-click adventure games to RTS games with full-motion video cut-scenes and FPS titles referred to as Doom clones. Genres he still loves to this day. Kosta is also a musician, releasing dreamy electronic jams under the name Kbit.

Follow TweakTown on Google News
Newsletter Subscription