texttovideo Archives - Shapiro Consultants

What is OpenAI’s Sora? The text-to-video tool explained and when you might be able to use it

February 24, 2024
No Comments

ChatGPT maker OpenAI has now unveiled Sora, its artificial intelligence engine for converting text prompts into video. Think Dall-E (also developed by OpenAI), but for movies rather than static images.

It's still very early days for Sora, but the AI model is already generating a lot of buzz on social media, with multiple clips doing the rounds – clips that look as if they've been put together by a team of actors and filmmakers.

Here we'll explain everything you need to know about OpenAI Sora: what it's capable of, how it works, and when you might be able to use it yourself. The era of AI text-prompt filmmaking has now arrived.

OpenAI Sora release date and price

In February 2024, OpenAI Sora was made available to “red teamers” – that's people whose job it is to test the security and stability of a product. OpenAI has also now invited a select number of visual artists, designers, and movie makers to test out the video generation capabilities and provide feedback.

“We're sharing our research progress early to start working with and getting feedback from people outside of OpenAI and to give the public a sense of what AI capabilities are on the horizon,” says OpenAI.

In other words, the rest of us can't use it yet. For the time being there's no indication as to when Sora might become available to the wider public, or how much we'll have to pay to access it.

Two dogs on a mountain podcasting — (Image credit: OpenAI)

We can make some rough guesses about timescale based on what happened with ChatGPT. Before that AI chatbot was released to the public in November 2022, it was preceded by a predecessor called InstructGPT earlier that year. Also, OpenAI's DevDay typically takes place annually in November.

It's certainly possible, then, that Sora could follow a similar pattern and launch to the public at a similar time in 2024. But this is currently just speculation and we'll update this page as soon as we get any clearer indication about a Sora release date.

As for price, we similarly don't have any hints of how much Sora might cost. As a guide, ChatGPT Plus – which offers access to the newest Large Language Models (LLMs) and Dall-E – currently costs $ 20 (about £16 / AU$ 30) per month.

But Sora also demands significantly more compute power than, for example, generating a single image with Dall-E, and the process also takes longer. So it still isn't clear exactly how well Sora, which is effectively a research paper, might convert into an affordable consumer product.

What is OpenAI Sora?

You may well be familiar with generative AI models – such as Google Gemini for text and Dall-E for images – which can produce new content based on vast amounts of training data. If you ask ChatGPT to write you a poem, for example, what you get back will be based on lots and lots of poems that the AI has already absorbed and analyzed.

OpenAI Sora is a similar idea, but for video clips. You give it a text prompt, like “woman walking down a city street at night” or “car driving through a forest” and you get back a video. As with AI image models, you can get very specific when it comes to saying what should be included in the clip and the style of the footage you want to see.

https://t.co/SOUoXiSMBY pic.twitter.com/JB4zOjmbTpFebruary 15, 2024

To get a better idea of how this works, check out some of the example videos posted by OpenAI CEO Sam Altman – not long after Sora was unveiled to the world, Altman responded to prompts put forward on social media, returning videos based on text like “a wizard wearing a pointed hat and a blue robe with white stars casting a spell that shoots lightning from his hand and holding an old tome in his other hand”.

How does OpenAI Sora work?

On a simplified level, the technology behind Sora is the same technology that lets you search for pictures of a dog or a cat on the web. Show an AI enough photos of a dog or cat, and it'll be able to spot the same patterns in new images; in the same way, if you train an AI on a million videos of a sunset or a waterfall, it'll be able to generate its own.

Of course there's a lot of complexity underneath that, and OpenAI has provided a deep dive into how its AI model works. It's trained on “internet-scale data” to know what realistic videos look like, first analyzing the clips to know what it's looking at, then learning how to produce its own versions when asked.

So, ask Sora to produce a clip of a fish tank, and it'll come back with an approximation based on all the fish tank videos it's seen. It makes use of what are known as visual patches, smaller building blocks that help the AI to understand what should go where and how different elements of a video should interact and progress, frame by frame.

OpenAI Sora — Sora starts messier, then gets tidier (Image credit: OpenAI)

Sora is based on a diffusion model, where the AI starts with a 'noisy' response and then works towards a 'clean' output through a series of feedback loops and prediction calculations. You can see this in the frames above, where a video of a dog playing in the show turns from nonsensical blobs into something that actually looks realistic.

And like other generative AI models, Sora uses transformer technology (the last T in ChatGPT stands for Transformer). Transformers use a variety of sophisticated data analysis techniques to process heaps of data – they can understand the most important and least important parts of what's being analyzed, and figure out the surrounding context and relationships between these data chunks.

What we don't fully know is where OpenAI found its training data from – it hasn't said which video libraries have been used to power Sora, though we do know it has partnerships with content databases such as Shutterstock. In some cases, you can see the similarities between the training data and the output Sora is producing.

What can you do with OpenAI Sora?

At the moment, Sora is capable of producing HD videos of up to a minute, without any sound attached, from text prompts. If you want to see some examples of what's possible, we've put together a list of 11 mind-blowing Sora shorts for you to take a look at – including fluffy Pixar-style animated characters and astronauts with knitted helmets.

“Sora can generate videos up to a minute long while maintaining visual quality and adherence to the user’s prompt,” says OpenAI, but that's not all. It can also generate videos from still images, fill in missing frames in existing videos, and seamlessly stitch multiple videos together. It can create static images too, or produce endless loops from clips provided to it.

It can even produce simulations of video games such as Minecraft, again based on vast amounts of training data that teach it what a game like Minecraft should look like. We've already seen a demo where Sora is able to control a player in a Minecraft-style environment, while also accurately rendering the surrounding details.

OpenAI does acknowledge some of the limitations of Sora at the moment. The physics don't always make sense, with people disappearing or transforming or blending into other objects. Sora isn't mapping out a scene with individual actors and props, it's making an incredible number of calculations about where pixels should go from frame to frame.

In Sora videos people might move in ways that defy the laws of physics, or details – such as a bite being taken out of a cookie – might not be remembered from one frame to the next. OpenAI is aware of these issues and is working to fix them, and you can check out some of the examples on the OpenAI Sora website to see what we mean.

Despite those bugs, further down the line OpenAI is hoping that Sora could evolve to become a realistic simulator of physical and digital worlds. In the years to come, the Sora tech could be used to generate imaginary virtual worlds for us to explore, or enable us to fully explore real places that are replicated in AI.

How can you use OpenAI Sora?

At the moment, you can't get into Sora without an invite: it seems as though OpenAI is picking out individual creators and testers to help get its video-generated AI model ready for a full public release. How long this preview period is going to last, whether it's months or years, remains to be seen – but OpenAI has previously shown a willingness to move as fast as possible when it comes to its AI projects.

Based on the existing technologies that OpenAI has made public – Dall-E and ChatGPT – it seems likely that Sora will initially be available as a web app. Since its launch ChatGPT has got smarter and added new features, including custom bots, and it's likely that Sora will follow the same path when it launches in full.

Before that happens, OpenAI says it wants to put some safety guardrails in place: you're not going to be able to generate videos showing extreme violence, sexual content, hateful imagery, or celebrity likenesses. There are also plans to combat misinformation by including metadata in Sora videos that indicates they were generated by AI.

TechRadar – All the latest technology news

February 16, 2024
No Comments

OpenAI breaks new ground as the AI giant has revealed its first text-to-video model called Sora, capable of creating shockingly realistic content.

We’ve been wondering when the company was finally going to release its own video engine as so many of its rivals, from Stability AI to Google, have beaten them to the punch. Perhaps OpenAI wanted to get things just right before a proper launch. At this rate, the quality of its outputs could eclipse its contemporaries. According to the official page, Sora can generate “realistic and imaginative scenes” from a single text prompt; much like other text-to-video AI models. The difference with this engine is the technology behind it.

Lifelike content

Open AI claims its artificial intelligence can understand how people and objects “exist in the physical world”. This gives Sora the ability to create scenes featuring multiple people, varying types of movement, facial expressions, textures, and objects with a high amount of detail. Generated videos lack the plastic look or the nightmarish forms seen in other AI content – for the most part, but more on that later.

Sora is also multimodular. Users will reportedly be able to upload a still image to serve as the basis of a video. The content inside the picture will become animated with a lot of attention paid to the small details. It can even take a pre-existing video “and extend it or fill in missing frames.”

Prompt: A litter of golden retriever puppies playing in the snow. Their heads pop out of the snow, covered in. pic.twitter.com/G1qhJRV9tgFebruary 15, 2024

You can find sample clips on OpenAI’s website and on X (the platform formerly known as Twitter). One of our favorites features a group of puppies playing in the snow. If you look closely, you can see their fur and the snow on their snouts have a strikingly lifelike quality to them. Another great clip shows a Victoria-crowned pigeon bobbing around like an actual bird.

A work in progress

As impressive as these two videos may be, Sora is not perfect. OpenAI admits its “model has weaknesses.” It can have a hard time simulating the physics of an object, confuse left from right, as well as misunderstand “instances of cause and effect.” You can have an AI character bite into a cookie, but the cookie lacks a bite mark.

It makes a lot of weird errors too. One of the funnier mishaps involves a group of archeologists unearthing a large piece of paper which then transforms into a chair before ending up as a crumpled piece of plastic. The AI also seems to have trouble with words. “Otter” is misspelled as “Oter” and “Land Rover” is now “Danover”.

even the sora mistakes are mesmerizing pic.twitter.com/OvPSbaa0L9February 15, 2024

Moving forward, the company will be working with its “red teamers” who are a group of industry experts “to assess critical areas for harms or risks.” They want to make sure Sora doesn’t generate false information, hateful content, or have any bias. Additionally, OpenAI is going to implement a text classifier to reject prompts that violate their policy. These include inputs requesting sexual content, violent videos, and celebrity likenesses among other things.

No word on when Sora will officially launch. We reached out for info on the release. This story will be updated at a later time. In the meantime, check out TechRadar's list of the best AI video editors for 2024.

TechRadar – All the latest technology news

Posts tagged "texttovideo"

What is OpenAI’s Sora? The text-to-video tool explained and when you might be able to use it

OpenAI Sora release date and price

What is OpenAI Sora?

How does OpenAI Sora work?

What can you do with OpenAI Sora?

How can you use OpenAI Sora?

You might also like

OpenAI’s new Sora text-to-video model can make shockingly realistic content

Lifelike content

A work in progress

You might also like