Stability AI, the developer behind the Stable Diffusion, is previewing a new generative AI that can create short-form videos with a text prompt.
Aptly called Stable Video Diffusion, it consists of two AI models (known as SVD and SVD-XT) and is capable of creating clips at a 576 x 1,024 pixel resolution. Users will be able to customize the frame rate speed to run between three and 30 FPS. The length of the videos depends on which of the twin models is chosen. If you select SVD, the content will play for 14 frames while SVD-XT extends that a bit to 25 frames. The length doesn’t matter too much as rendered clips will only play for about four seconds before ending, according to the official listing on Hugging Face.
The company posted a video on its YouTube channel showing off what Stable Video Diffusion is capable of and the content is surprisingly high quality. They're certainly not the nightmare fuel you see on other AI like Meta’s Make-A-Video. The most impressive, in our opinion, has to be the Ice Dragon demo. You can see a high amount of detail in the dragon’s scales plus the mountains in the back look like something out of a painting. Animation, as you can imagine, is rather limited as the subject can only slowly bob its head. The same can be seen in other demos. It’s either a stiff walking cycle or a slow panning shot.
In the early stages
Limitations don’t stop there. Stable Video Diffusion reportedly cannot “achieve perfect photorealism”, it can’t generate “legible text”, plus it has a tough time with faces. Another demonstration on Stability AI’s website does show its model is able to render a man’s face without any weird flaws so it could be on a case-by-case basis.
Keep in mind that this project is still in the early stages. It’s obvious the model is not ready for a wide release nor are there any plans to do so. Stability AI emphasizes that Stable Video Diffusion is not meant “for real-world or commercial applications” at this time. In fact, it is currently “intended for research purposes only.” We’re not surprised the developer is being very cautious with its tech. There was an incident last year where Stability Diffusion’s model leaked online, leading to bad actors using it to create deep fake images.
If you’re interested in trying out Stable Video Diffusion, you can enter a waitlist by filling out a form on the company website. It’s unknown when people will be allowed in, but the preview will include a Text-To-Video interface. In the meantime, you can check out the AI’s white paper and read up on all the nitty gritty behind the project.
One thing we found interesting after digging through the document is it mentions using “publicly accessible video datasets” as some of the training material. Again, it's not surprising to hear this considering that Getty Images sued Stability AI over data scraping allegations earlier this year. It looks like the team is striving to be more careful so it doesn't make any more enemies.
No word on when Stable Video Diffusion will launch. Luckily, there are other options. Be sure to check out TechRadar's list of the best AI video makers for 2023.