AI Agents like OpenAI's 'Operator' have a long way to go before replacing humans

Users have had the chance to gather early impressions following the January 23 release of OpenAI's Operator. So far, they're a little underwhelming.

VIEW GALLERY - 5

Ille Smolanko

TweakTown

Published Feb 3, 2025 8:00 PM CST

3-minute read time

TL;DR: OpenAI's Operator was hyped as a groundbreaking AI agent capable of autonomous tasks, but early impressions suggest it's slow, error-prone, and still requires heavy supervision.

Voice: DefaultSpeed

0:00 / --:--

OpenAI's Operator was attached to some strong claims in the lead up to its January 23 launch. 'Ph.D. level intelligence', the ability to autonomously carry out coding tasks, and the potential to exceed human capabilities. However, early user experiences with the tool suggest the contrary.

VIEW GALLERY - 5 IMAGES

The key distinction of AI Agents like Operator from a chatbot like ChatGPT is that they're designed to act on your behalf. Meaning: give them a task, and they'll take care of it with minimal oversight. Operator functions by essentially taking over your computer, utilizing a Computer-Using Agent (CUA) model that integrates visual processing and reasoning capabilities to interpret what's happening on the screen, and to carry out certain actions.

Bloomberg's Rachel Metz spent some time with Operator, taking it through various day-to-day tasks. Purchasing groceries, booking reservations, and filling out forms. The agent was able to successfully order lipstick from Sephora, fill out a cart for Ben & Jerry's ice cream, and even suggested adding additional items to qualify for free delivery. However, it fell short on simple tasks like filling out spreadsheets, managing calendars, and navigating unfamiliar web-pages. A common thread among users was that the agent required constant supervision - and was not particularly efficient even when it did succeed.

Read more: OpenAI's new 'Operator' touted as the next breakthrough in artificial intelligence

"For several agonizing moments, I watched as OpenAI's artificially intelligent agent slowly navigated the internet like someone who's had the web described to them in great detail but never actually used it."

"It asks so many follow-up questions that it negated any time saved."

The user-interface for Operator (Image: OpenAI)

An AI-enthusiast Reddit user was another one of the first people to gain access to the tool. They took to the platform to share their experience:

"Operator is quite simply too slow, expensive, and error-prone."

"It hallucinated worse than GPT-3"

Naturally, the discussion around autonomous AI raises questions about job displacement. Jensen Huang, at CES 2025, famously proclaimed 'IT will become the HR of AI agents." Sam Altman and Mark Zuckerberg are also outspokenly bullish on AI agent capabilities. However, for every bold claim about AI agents replacing workforces, there are reminders of their current limitations.

The 'world's first AI software engineer' - Devin - was similarly touted as a paradigm shift in the programming space. Following its release, both users and researchers debunked some of the tools claims. Highlighting its failure to complete 85% of assigned tasks, and observations such as:

"Devin didn't complete the advertised task. Instead, it generated errors in its own code and then fixed them"-- Carl Brown, Software Developer

"Even more concerning was Devin's tendency to press forward with tasks that weren't actually possible."- Answer AI, Research Team

While the claims surrounding AI agents may eventually prove true, practical application, rather than speculation, will be the real determining factor.