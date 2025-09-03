NVIDIA is developing the next generation of AI by teaching it the fundamentals of human behavior, starting with something as basic as making toast.

TL;DR: NVIDIA is advancing AI by teaching common sense through its Cosmos Reason model, enhancing physical reasoning for robotics, autonomous vehicles, and smart spaces. Using reinforcement learning and real-world footage, the AI gains essential understanding of physical interactions, crucial for safe and effective real-world applications.

NVIDIA has detailed in a recent press release that it intends to teach AI what seems obvious to humans: common sense. The company says that visual AI models currently lack this understanding, and if physical AI is ever to come to the real world, it will need to have a grasp on what humans deem common sense.

Common sense, or the basic understanding that humans develop through real-world experiences, can't be organically learned by AI; the models have to be specifically taught it. In order to teach AI models common sense, a series of tests was developed to coach them on the limitations of the physical world.

For example, NVIDIA's Cosmos Reason model, an open reasoning vision language model (VLM) that is used for physical AI applications such as robotics, autonomous vehicles, and smart spaces, is currently leading when it comes to the physical reasoning (common sense) leaderboard.

How did NVIDIA do this? The company explains that the model has to start off small, learning about the physical world through reinforcement learning. For example, the above video shows an example from Cosmos Reason's evaluation dataset where the AI model is asked to analyze the physical world in the footage.

The model is asked, "What is the relative motion of the vehicles seen in the background?" The AI model then looks at the footage and picks an answer from four choices. Its answer is then looked at by NVIDIA analysts, and the model is then reinforced with the correct answer.

"Distilling human common sense about the physical world into models is how NVIDIA is bringing about the next generation of AI," writes NVIDIA

Since AI needs to learn how the world works in the most basic sense, any piece of footage has become super valuable for training visual models, which will eventually be the underlying technology powering humanoid robotics, autonomous vehicles, and other forms of physical technology that can directly interact with the real world.