NVIDIA is teaching AI human common sense by getting it to make toast

NVIDIA is developing the next generation of AI by teaching it the fundamentals of human behavior, starting with something as basic as making toast.

NVIDIA is teaching AI human common sense by getting it to make toast
Comment IconFacebook IconX IconReddit Icon
Tech and Science Editor
Published
2-minute read time
TL;DR: NVIDIA is advancing AI by teaching common sense through its Cosmos Reason model, enhancing physical reasoning for robotics, autonomous vehicles, and smart spaces. Using reinforcement learning and real-world footage, the AI gains essential understanding of physical interactions, crucial for safe and effective real-world applications.

NVIDIA has detailed in a recent press release that it intends to teach AI what seems obvious to humans: common sense. The company says that visual AI models currently lack this understanding, and if physical AI is ever to come to the real world, it will need to have a grasp on what humans deem common sense.

Common sense, or the basic understanding that humans develop through real-world experiences, can't be organically learned by AI; the models have to be specifically taught it. In order to teach AI models common sense, a series of tests was developed to coach them on the limitations of the physical world.

For example, NVIDIA's Cosmos Reason model, an open reasoning vision language model (VLM) that is used for physical AI applications such as robotics, autonomous vehicles, and smart spaces, is currently leading when it comes to the physical reasoning (common sense) leaderboard.

How did NVIDIA do this? The company explains that the model has to start off small, learning about the physical world through reinforcement learning. For example, the above video shows an example from Cosmos Reason's evaluation dataset where the AI model is asked to analyze the physical world in the footage.

The model is asked, "What is the relative motion of the vehicles seen in the background?" The AI model then looks at the footage and picks an answer from four choices. Its answer is then looked at by NVIDIA analysts, and the model is then reinforced with the correct answer.

"Distilling human common sense about the physical world into models is how NVIDIA is bringing about the next generation of AI," writes NVIDIA

Since AI needs to learn how the world works in the most basic sense, any piece of footage has become super valuable for training visual models, which will eventually be the underlying technology powering humanoid robotics, autonomous vehicles, and other forms of physical technology that can directly interact with the real world.

"Without basic knowledge about the physical world, a robot may fall down or accidentally break something, causing danger to the surrounding people and environment," said Yin Cui, a Cosmos Reason research scientist at NVIDIA

Photo of the NVIDIA Jetson AGX Orin Developer Kit
Best Deals: NVIDIA Jetson AGX Orin Developer Kit
Today7 days ago30 days ago
$2012.50 USD$1999 USD
$3740 CAD-
£1814.62£1856.80
$2012.50 USD$1999 USD
Check PriceCheck Price
* Prices last scanned 4/30/2026 at 2:47 am CDT - prices may be inaccurate. As an Amazon Associate, we earn from qualifying purchases. We earn affiliate commission from any Newegg or PCCG sales.
News Source:blogs.nvidia.com

Tech and Science Editor

Email IconX IconLinkedIn Icon

Jak joined TweakTown in 2017 and has since reviewed 100s of new tech products and kept us informed daily on the latest science, space, and artificial intelligence news. Jak's love for science, space, and technology, and, more specifically, PC gaming, began at 10 years old. It was the day his dad showed him how to play Age of Empires on an old Compaq PC. Ever since that day, Jak fell in love with games and the progression of the technology industry in all its forms.

Follow TweakTown on Google News
Newsletter Subscription