Since the release of DALL-E 2 at the end of 2022, text to image generators have been all the rage, and many strong competitors have entered the market. More than a year later, we are at the dawn of a new technology: AI video generation.
Google Research on Tuesday published a research paper on Lumiere, a text-to-video diffusion model that can create highly realistic videos from text prompts and other images.
Related article: Best AI image generator of 2024: DALL-E 2 and its alternatives
The model was designed to address a key challenge in video generation and synthesis: producing “realistic, diverse, and consistent motion,” according to the paper. As you may have noticed, video generation models typically result in choppy video, whereas Google’s approach provides a more seamless viewing experience, as seen in the video below.
It is a huge upgrade over other models as the video clips are not only smooth to watch but also look very realistic. Lumiere achieves this through a spatiotemporal U-Net architecture that generates the temporal duration of a video all at once through a single pass.
This video generation method differs from other existing models that synthesize distant keyframes. According to the paper, this approach inherently makes it difficult to achieve video consistency.
Lumiere works like a regular image generator and accepts a variety of inputs, including text-to-video, which generates a video from a text prompt, and image-to-video, which takes an image and generates a video using an accompanying prompt. You can generate videos from. Bring your photos to life with videos.
This model can also add a fun twist to video generation through stylized generation, which uses a single reference image and uses user prompts to generate the video in a target style.
In addition to generating videos, this model can be used to create various visual stylizations that modify videos to reflect specific prompts, cinemagraphs that animate specific areas of a photo, and You can also edit existing videos through area-fill repair. video.
Related article: 7 ways AI is solving meetings, according to Microsoft
In the paper, Google asked a group of testers to select videos that they thought were good in terms of visual quality and movement, and compared them to other prominent texts such as ImagenVideo, Pika, ZeroScope, and Gen2. We measured the performance of Lumiere on the diffusion model to video. We don’t know which model generated each video.
Google’s model performed better than other models in all categories, including text-to-video quality, text-to-video text alignment, and image-to-video quality.
This model is not yet available to the public. However, if you want to learn more or watch the model in action, visit the Lumiere website. There you will see many demos of models performing various tasks.