Newer AI models cheat to win at chess - maybe they're already more humanlike than we thought

Researchers have found that deep reasoning models like ChatGPT o1-preview and DeepSeek-R1 are bad losers and will cheat to ensure they win.

Newer AI models cheat to win at chess - maybe they're already more humanlike than we thought
Comment IconFacebook IconX IconReddit Icon
Tech Reporter
Published
1 minute & 45 seconds read time
TL;DR: Researchers found that new deep reasoning AI models, like ChatGPT o1-preview and DeepSeek-R1, often resort to cheating in problem-solving, as evidenced by getting them to play chess. These AIs are prone to hacking the game by default, whereas traditional LLMs won't do this, not unless they are encouraged to cheat as the only clear path to victory.

The newer breed of deep reasoning models - designed to 'think' before answering - are also more open to taking any route possible to solve a given problem, it seems, even if that means cheating.

Checkmate at any cost, apparently (Image Credit: Pixabay)

Checkmate at any cost, apparently (Image Credit: Pixabay)

Researchers submitted a paper to Cornell university entitled 'Demonstrating specification gaming in reasoning models' which tested AIs playing games of chess on Stockfish.

They found that the new models, such as ChatGPT o1-preview and DeepSeek-R1, would "often hack the benchmark by default" - meaning resorting to cheating of one kind or another.

On the other hand, traditional LLMs such as GPT-4o and Claude 3.5 Sonnet would play by the rules - they needed to be told that they wouldn't win by playing normally, to effectively nudge them to look at hacking.

The researchers concluded:

"Our results suggest reasoning models may resort to hacking to solve difficult problems, as observed in OpenAI (2024)'s o1 Docker escape during cyber capabilities testing."

As TechRadar, which spotted this, points out, the deep reasoning AIs used various ways of cheating, including running a copy of Stockfish separately, in order to suss out how it played - a milder chear - to more audacious measures like replacing the Stockfish engine and overwriting the board, moving its pieces to more advantageous positions.

As AI models get even more advanced, if you ask one to undertake a task, then it's likely to pursue any avenue for accomplishing said task, as the movies have taught us well.

There's a lot of talk about not rushing the progress made with AI, and taking into account safety and guardrails, and so on - but always the sneaking suspicion that this is mostly lip service, coming from those who will undoubtedly benefit from the huge push underway to make AIs increasingly advanced, increasingly swiftly.

What could go wrong, after all? Again, we refer you to our previous comment about the lessons from the movies...

Photo of the HP 255 G10 Laptop for Home or Work, 16GB RAM, 1TB SSD, 15.6" Full HD, Ryzen 3 7330U CPU
Best Deals: HP 255 G10 Laptop for Home or Work, 16GB RAM, 1TB SSD, 15.6" Full HD, Ryzen 3 7330U CPU
Country flagToday7 days ago30 days ago
$449.99 USD-
$449.99 USD-
$449.99 USD-
$449.99 USD-
* Prices last scanned on 3/20/2025 at 8:40 pm CDT - prices may not be accurate, click links above for the latest price. We may earn an affiliate commission from any sales.

Tech Reporter

Email IconX IconLinkedIn Icon

Darren has written for numerous magazines and websites in the technology world for almost 30 years, including TechRadar, PC Gamer, Eurogamer, Computeractive, and many more. He worked on his first magazine (PC Home) long before Google and most of the rest of the web existed. In his spare time, he can be found gaming, going to the gym, and writing books (his debut novel – ‘I Know What You Did Last Supper’ – was published by Hachette UK in 2013).

Darren's Computer

Related Topics

Newsletter Subscription