AI started it not OpenAI is plugging away this week, it seems — sticking to their product roadmaps even as coverage of the OpenAI mess dominates the airwaves.
See: Stability AI, which is this afternoon Office has partnered Stable Video Diffusion, an AI model that creates videos by animating existing images. Based on Stability’s existing Stable Diffusion text-to-image model, Stable Video Diffusion is one of the few video-generating models available in open source — or commercial, for that matter.
But not at all.
Given how other AI research previews – including Self-reliance — out of history, this writer would not be surprised to see the model begin to circulate on the dark web in short order. If this were to happen, I would be concerned about the ways in which Stable Video could be abused, as it does not appear to contain any content. When Stable Diffusion was released, it wasn’t long before actors with questionable intentions were using it to create nonconsensual deepfake porn — and worse.
But I stayed away.
Stable Video Diffusion comes in the form of two models, actually – SVD and SVD-XT. The first, SVD, converts images to 576 × 1024 videos at 14 frames. SVD-XT uses the same architecture, but increases the frames to 24. Both can produce videos between 3 and 30 frames per second.
According to a white paper released with Stable Video Diffusion, SVD and SVD-XT are initially trained on a data set of millions of videos and then “refined” on a smaller set of hundreds of thousands to of a million clips. Where the videos come from isn’t immediately clear — the paper explains that many come from public research data sets — so it’s impossible to know whether any are under copyright. If they are, it could open users of Stability and Stable Video Diffusion to legal and ethical challenges regarding usage rights. Time will tell.
Regardless of the source of the training data, the models – both SVD and SVD-XT – generate relatively high-quality four-second clips. By this writer’s estimation, the cherry-picked samples on Stability’s blog could go hand in hand with the outputs from Meta’s new video generation model as well as the AI-produced examples we’ve seen from Google and AI startups Runway and Pika Labs.
But Stable Video Diffusion has limitations. Strength is clear about this, writing on the Hugging Face pages of models – THE pages from where researchers can apply to access Stable Video Diffusion — that models cannot produce videos without motion or slow camera pans, text controlled, text translation (mostly less legible) or consistently generate faces and people “just right.”
However – while it is still early – Strength says that the models are quite extensible and can be adapted to use cases such as creating a 360-degree view of objects.
So what can Stable Video Diffusion do? Well, Stability says it plans “a range” of models that “build and extend” SVD and SVD-XT as well as a “text-to-video” tool that will bring text prompt on web models. The end goal as commercialization – Stability rightly says that Stable Video Diffusion has potential applications in “advertising, education, entertainment and beyond.”
In fact, Stability is gunning for a hit as startup investors ramp up the pressure.
In April, Semafor reported that Stability AI burns through money, prompting an executive search to increase sales. According to Forbes, the company has repeatedly delayed or outright failed to pay wages and payroll taxes, leading AWS — which Stability uses for computing to train its models — to threaten to withdraw. Access to Stability of GPU instances.
Stability AI recently raised $25 million through a convertible note (ie debt converted to equity), bringing its total increase to over $125 million. But it didn’t close the new fund at a higher valuation; the startup was last valued at $1 billion. The firm is said to be looking to quadruple in the next few months, despite stubbornly low profits and high burn rates.
The stability suffered another blow recently in ABANDON by Ed Newton-Rex, who has been the startup’s VP of audio for just over a year and played a key role in launching Stability’s music-generating tool, Stable Audio. In a public letter, Newton-Rex said he left Stability due to a disagreement over copyright and how copyrighted data should — and should not — be used to train AI models.