From text to video with AI: Zeroscope

What is Zeroscope? 📹

Zeroscope is a software developed by the Chinese company Modelscope, which allows to generate videos simply from text, i.e., we give an input such as “Blue fish swimming in outer space” and based on that input, generates a video trying to represent it.

And Zeroscope has released its next version: Zeroscope V2 also scalable to a larger size (in terms of pixels in the videos) with the V2 XL version. It has been trained with almost 10,000 clips paying attention to how one frame interrelates with the following ones.

I have created a couple of example videos to show in Makiai:

📂 Zeroscope is open source so the community can view its code and grow based on new ideas and implementations, which in practice guarantees rapid development.

GEN2 by Runway vs Zeroscope ⚔️

Runway is a commercial video and photo editing tool through AI, which has released pioneering GEN1 and then GEN2 which is a video generation tool through text. Zeroscope is its open source rival right now, as Runway’s tools have not released their code to the public.

How to use Zeroscope🍣

You can use Zeroscope in several ways that I will explain here:

  • Replicate: Replicate is a website where API’s and AI playgrounds are served. There you can use both through API and interface the Zeroscope model for free, after a few times you will have to pay.
  • HuggingFace: In principle it is free but if you use it too much you will have to pay. Note: This link is the version before Zeroscape not the latest (V2-XL).
  • En tu PC: Puedes insYou can install Zeroscope on your computer. The minimum RAM is 8GB to run the model and installing it is not for the impatient.

Short-term impact ✨

You don’t have to be very imaginative to realize the fact that models like GEN2 or Zeroscape V2 can have a big impact on society at many levels. Right now they are reminiscent of OpenAI’s StableDiffusion or DALLE, taking their first steps just a few years ago.

If they follow the same trends as image-related AIs, which is likely, in 1-2 years we will see how videos of a “decent” quality and coherence can be generated in longer and longer lengths.

And this may involve changes in society such as:

  • Cinema:It can be used as an additional tool in the creation of movies and series and even be the consumer who directly asks the AI for a movie or series to their liking.
  • Content platforms: Massive content could be generated by specialists as well as on-demand, fully customized content.

The limit is imagination and its consequences on the economy and society are hardly predictable.