Researchers have plugged 500 million years worth of evolutionary data into an AI and asked it to create genetic code. The results were surprising as the AI produced protein sequences never-before-seen by researchers.

The project is headed by scientists at the EvolutionaryScale and Arc Institute, and a new paper published in the journal Science details their findings. According to the team, the new AI model called ESM3 is capable of simulating brand-new protein sequences that can be used by researchers to develop a deeper understanding of how proteins work and ultimately be used in various scientific fields such as health.
As with every AI tool, the underlying technology requires large swaths of data for it to be functional, which is why, to create ESM3, the team trained the AI on 771 billion tokens that generated 3.15 billion protein sequences, 236 million protein instructions, and 539 million protein annotations.

According to the team, feeding this much data into the AI was like stuffing 500 million years worth of evolutionary data and knowledge into a system. The end result was an AI that can now simulate evolution and, in particular, has created a virtual protein that had a previously unseen genetic sequence, which researchers have named esmGFP. With advanced AI systems being capable of generating previously unseen genetic codes researchers hope to learn more about the evolution of human biology.