Researchers at Chinese AI company ByteDance have unveiled their latest AI video model: OmniHuman. This end-to-end AI framework can generate realistic human videos from a single image, producing life-like movements and natural gestures while seamlessly integrating different audio and video inputs.
Several examples of OmniHuman in action have surfaced on X, featuring figures like Taylor Swift, Albert Einstein, and NVIDIA CEO Jensen Huang. Notable demonstrations include:
- Taylor Swift, animated to sing along to Blue Bird by Ikimono-gakari.
- Albert Einstein, lip-synced to an American audio clip discussing life and emotions.
- Jensen Huang, rapping the lyrics to a Chinese hip-hop track from 野狼Disco.
OmniHuman's capabilities are particularly striking. The Albert Einstein video, for example, was recreated from a still portrait taken in 1921. You can note slight shimmers in the recreation of the famous physicist, but the output is very convincing. From the animation of his hands and eyes, to the lip-syncing of the image to the audio provided.

Credit: OmniHuman
How It Works
OmniHuman utilizes a "multimodality motion conditioning mixed training" technique, allowing users to submit a single image as input to generate a life-like video. It works with portraits, half-body, and full-body images and can even animate non-human subjects like cartoons and animals. Potential commercial applications span entertainment (AI-generated acting), education (teaching materials), and retail (personalized shopping experiences).
Currently, OmniHuman remains in the research phase and is not publicly available. However, ByteDance has indicated that a code release is planned for the near future.