Mark Zuckerberg’s Meta isn't the only company developing an AI-powered program that can generate video out of text inputs. Google has been working on one, too.
On Wednesday, researchers at the company’s AI lab, Google Brain, debuted(Opens in a new window) Imagen Video(Opens in a new window), a program that can create realistic-looking video clips from a text input. The system expands Google’s original Imagen(Opens in a new window) program by moving beyond still images to moving pictures, resulting in creative videos that remain largely consistent throughout each frame.
“We find Imagen Video not only capable of generating videos of high fidelity, but also having a high degree of controllability and world knowledge, including the ability to generate diverse videos and text animations in various artistic styles and with 3D object understanding,” Google researchers wrote(Opens in a new window) in a paper.
Imagen Video can create 5.3-second, 1,280-by-768 resolution videos running at 24 frames per second. Google’s researchers developed the program by training its computer models to identify videos and still images, which were already labeled with a text description. Imagen Video then tries to replicate that imagery in the form of a video when given a text prompt.
“While training on natural video data only enables the model to learn dynamics in natural settings, the model can learn about different image styles (such as sketch, painting, etc.) by training on images,” the paper added. “As a result, this joint training enables the model to generate interesting video dynamics in different styles.”
In total, Imagen Video was trained on an “internal dataset” made up of 14 million videos and 60 million still images, along with
Read more on pcmag.com