Meta AI unveiled a new research project and showcased clips produced with Make-A- Video, an AI system that generates seconds-long videos from text prompts, in late September. Make-A-Video builds on Meta AI’s recent progress in generative technology research, including Make-A-Scene, announced last July. Meta AI’s goal is to make this technology publicly available in the near future, and the research paper and results are already available to the community for ongoing feedback, refinement and evolution of the teams’ approach to this emerging technology.
Text-to-image models have been the subject of many publications recently, but when it comes to video, the challenge is much more complex: in addition to generating each pixel correctly, the system must also predict how it will evolve. Mark Zuckerberg, in a post on Facebook, states:
“Make-A-Video solves this problem by adding a layer of unsupervised learning that allows the system to understand motion in the physical world and apply it to traditional image text generation.”
Make-A-Video is not the first text-to-video model; Cog Video, for example, was recently introduced by a team of researchers from Tsinghua University and the Beijing Academy of AI.
The Make-A-Video clip generator model
The model was trained using matched text-image data and video clips without associated text ” to teach it how the world moves.” Each clip is captioned with the prompt used to generate the starting image, as below:
Make-A-Video also allows you to turn still images into videos or create variations or extensions of existing videos.
In thepaper published by the Meta AI researchers, they report that they used two datasets (WebVid-10M and HD-VILA100M) with millions of videos, or hundreds of thousands of hours of footage, to train their model.
They recognize that the model has limitations: some sessions are blurred, the animations are disjointed, the rendering of movements, such as walking, is not really satisfactory. They will also have to improve the resolution of the videos.
In order to reduce the risk of harmful content in the videos, the research team preventively cleaned the training dataset from pornographic content as well as from toxic phrases.
Like other Meta AI research, the project was released as open source along with its announcement. Meta AI states:
“We want to think about how we build new generative AI systems like this. Make-A-Video uses publicly available datasets, which adds an extra level of transparency to the research. We are openly sharing this generative AI research and results with the community for feedback, and we will continue to use our Responsible AI framework to refine and evolve our approach to this emerging technology.”
Translated from Meta AI dévoile Make-A-Video, un modèle de génération de vidéos