OpenAI’s impressive new Sora videos show it has serious sci-fi potential

OpenAI's Sora, its equivalent of image creation but for videos, made huge shockwaves in the swiftly advancing world of AI last month, and we’ve just caught a few new videos which are even more jaw-slackening than what we have already been treated to.

In case you somehow missed it, Sora is a text-to-video AI meaning you can write a simple request and it’ll compose a video (just as image generation previously worked, but obviously a much more complex endeavor).

An eye with the iris being a globe

(Image credit: OpenAI)

Now OpenAI’s Sora research lead Tim Brooks has released some new content generated by Sora on X (formerly Twitter). 

This is Sora’s crack at fulfilling the following request: “Fly through tour of a museum with many paintings and sculptures and beautiful works of art in all styles.”

Pretty impressive to say the least. On top of that, Bill Peebles, also a Sora research lead, showed us a clip generated from the following prompt: “An alien blending in naturally with new york city, paranoia thriller style, 35mm film.”

An alien character walking through a street

(Image credit: OpenAI)

Content creator Blaine Brown then stepped in to embellish the above clip, cutting it to repeat the footage and make it longer, while having the alien rapping, complete with lip-syncing. The music is generated by Suno AI by the way (with the lyrics written by Brown, mind), and lip-syncing is done with Pika Labs AI.

See more

Analysis: Still early days for Sora

Two people having dinner

(Image credit: OpenAI)

It’s worth underlining how fast things seem to be progressing with the capabilities of AI. Image creation powers were one thing – and extremely impressive in themselves – but this is entirely another. Especially when you remember that Sora is still just in testing at OpenAI, with a limited set of ‘red teamers’ (testers hunting out bugs and smoothing over those wrinkles).

The camera work in the museum fly-through flows realistically and feels nicely imaginative in the way it swoops around (albeit with the occasional judder). And the last tweet shows how you can take a base clip and flesh it out with content including AI-generated music.

Of course, AI can write a script as well, and so it begs the question: how long will it be before a blue alien is starring in an AI-generated post-apocalyptic drama. Or an (unintentional) comedy perhaps?

You get the idea, and we’re getting carried away, of course, but still – what AI could be capable of in just a few years is potentially mind-blowing, frankly.

Naturally, we’ll be seeing the cream of the crop of what Sora is capable of in these teasers, and there have been some buggy and weird efforts aired too. (Just as when ChatGPT and other AI chatbots first rolled onto the scene, we saw AI hallucinations and general unhinged behavior and replies).

Perhaps the broader worry with Sora, though, is how this might eventually displace, rather than assist, content creators. But that’s a fear to chew over on another day – not forgetting the potential for misuse with AI-created videos which we recently discussed in more depth here.

You might also like

TechRadar – All the latest technology news

Read More

Google’s impressive Lumiere shows us the future of making short-form AI videos

Google is taking another crack at text-to-video generation with Lumiere, a new AI model capable of creating surprisingly high-quality content. 

The tech giant has certainly come a long way from the days of Imagen Video. Subjects in Lumiere videos are no longer these nightmarish creatures with melting faces. Now things look much more realistic. Sea turtles look like sea turtles, fur on animals has the right texture, and people in AI clips have genuine smiles (for the most part). What’s more, there's very little of the weird jerky movement seen in other text-to-video generative AIs. Motion is largely smooth as butter. Inbar Mosseri, Research Team Lead at Google Research, published a video on her YouTube channel demonstrating Lumiere’s capabilities. 

Google put a lot of work into making Lumiere’s content appear as lifelike as possible. The dev team accomplished this by implementing something called Space-Time U-Net architecture (STUNet). The technology behind STUNet is pretty complex. But as Ars Technica explains, it allows Lumiere to understand where objects are in a video, how they move and change and renders these actions at the same time resulting in a smooth-flowing creation. 

This runs contrary to other generative platforms that first establish keyframes in clips and then fill in the gaps afterward. Doing so results in the jerky movement the tech is known for.

Well equipped

In addition to text-to-video generation, Lumiere has numerous features in its toolkit including support for multimodality. 

Users will be able to upload source images or videos to the AI so it can edit them according to their specifications. For example, you can upload an image of Girl with a Pearl Earring by Johannes Vermeer and turn it into a short clip where she smiles instead of blankly staring. Lumiere also has an ability called Cinemagraph which can animate highlighted portions of pictures.

Google demonstrates this by selecting a butterfly sitting on a flower. Thanks to the AI, the output video has the butterfly flapping its wings while the flowers around it remain stationary. 

Things become particularly impressive when it comes to video. Video Inpainting, another feature, functions similarly to Cinemagraph in that the AI can edit portions of clips. A woman’s patterned green dress can be turned into shiny gold or black. Lumiere goes one step further by offering Video Stylization for altering video subjects. A regular car driving down the road can be turned into a vehicle made entirely out of wood or Lego bricks.

Still in the works

It’s unknown if there are plans to launch Lumiere to the public or if Google intends to implement it as a new service. 

We could perhaps see the AI show up on a future Pixel phone as the evolution of Magic Editor. If you’re not familiar with it, Magic Editor utilizes “AI processing [to] intelligently” change spaces or objects in photographs on the Pixel 8. Video Inpainting, to us, seems like a natural progression for the tech.

For now, it looks like the team is going to keep it behind closed doors. As impressive as this AI may be, it still has its issues. Jerky animations are present. In other cases, subjects have limbs warping into mush. If you want to know more, Google’s research paper on Lumiere can be found on Cornell University’s arXiv website. Be warned: it's a dense read.

And be sure to check out TechRadar's roundup of the best AI art generators for 2024.

You might also like

TechRadar – All the latest technology news

Read More