World's 'first AI software engineer' fails 85% of its assigned tasks

A service touted as the 'first AI software engineer' has been debunked and found to have only successfully completed 15% of its tasks.

World's 'first AI software engineer' fails 85% of its assigned tasks
Comment IconFacebook IconX IconReddit Icon
Tech and Science Editor
Published
1 minute & 45 seconds read time
TL;DR: Devin, an AI tool by Cognition AI, claims to autonomously build and fix code but has limitations. Criticized for errors and inefficiency, it completed only a few tasks successfully. Despite its potential, reliability issues persist.

In the midst of the 'AI Revolution', there's been plenty of speculation about AI taking away jobs, and no sector has been dealing with those fears more than the software engineering industry.

However, programmers can rest assured that one of the latest tools touted as a fully autonomous AI software engineer reportedly has its limitations. Devin is an AI programming tool originally released by Cognition AI in March of 2024. The tool, hailed as the "first AI software engineer," ignited a range of concerns for programmers with fears regarding job security. Particularly given that some of the claims included the ability to "build and deploy apps end to end" and "autonomously find and fix bugs in codebases."

Following its release, Cognition uploaded a video entitled "Devin's Upwork Side Hustle", which essentially claimed that the tool could make money through the completion of Upwork tasks. In April 2024, Veteran software developer Carl Brown of the YouTube channel Internet of Bugs quickly took to the platform to debunk some of the tool's claims, citing criticisms such as:

"Devin didn't complete the advertised task. Instead, it generated errors in its own code and then fixed them"

Shortly after, the original poster of the Upwork ad released a video supporting [Brown's] claim.

Devon was rolled out to the general public in December 2024 with a price of $500 per month. Since then, the feedback has been somewhat similar. Three data scientists from Answer.AI, a reputable AI research and development lab, tested Devin and found that they only completed three out of 20 tasks successfully.

Another analysis conducted by engineers Hamel Husain, Isaac Flath, and Johno Whitaker followed a similar pattern. Stating that "tasks that seemed straightforward often took days rather than hours" and that it had the concerning tendency to "press forward with tasks that weren't actually possible". The researchers did credit the tool, noting that it was impressive - when it worked. However, they concluded the statement, exclaiming "that's the problem - it rarely worked."

NEWS SOURCE:theregister.com

Tech and Science Editor

Email IconX IconLinkedIn Icon

Jak joined the TweakTown team in 2017 and has since reviewed 100s of new tech products and kept us informed daily on the latest science, space, and artificial intelligence news. Jak's love for science, space, and technology, and, more specifically, PC gaming, began at 10 years old. It was the day his dad showed him how to play Age of Empires on an old Compaq PC. Ever since that day, Jak fell in love with games and the progression of the technology industry in all its forms.

Related Topics

Newsletter Subscription