OpenAI’s new Sora text-to-video model can make shockingly realistic content

OpenAI breaks new ground as the AI giant has revealed its first text-to-video model called Sora, capable of creating shockingly realistic content.

We’ve been wondering when the company was finally going to release its own video engine as so many of its rivals, from Stability AI to Google, have beaten them to the punch. Perhaps OpenAI wanted to get things just right before a proper launch. At this rate, the quality of its outputs could eclipse its contemporaries. According to the official page, Sora can generate “realistic and imaginative scenes” from a single text prompt; much like other text-to-video AI models. The difference with this engine is the technology behind it. 

Lifelike content

Open AI claims its artificial intelligence can understand how people and objects “exist in the physical world”. This gives Sora the ability to create scenes featuring multiple people, varying types of movement, facial expressions, textures, and objects with a high amount of detail. Generated videos lack the plastic look or the nightmarish forms seen in other AI content – for the most part, but more on that later.

Sora is also multimodular. Users will reportedly be able to upload a still image to serve as the basis of a video. The content inside the picture will become animated with a lot of attention paid to the small details. It can even take a pre-existing video “and extend it or fill in missing frames.” 

See more

You can find sample clips on OpenAI’s website and on X (the platform formerly known as Twitter). One of our favorites features a group of puppies playing in the snow. If you look closely, you can see their fur and the snow on their snouts have a strikingly lifelike quality to them. Another great clip shows a Victoria-crowned pigeon bobbing around like an actual bird.

A work in progress

As impressive as these two videos may be, Sora is not perfect. OpenAI admits its “model has weaknesses.” It can have a hard time simulating the physics of an object, confuse left from right, as well as misunderstand “instances of cause and effect.” You can have an AI character bite into a cookie, but the cookie lacks a bite mark.

It makes a lot of weird errors too. One of the funnier mishaps involves a group of archeologists unearthing a large piece of paper which then transforms into a chair before ending up as a crumpled piece of plastic. The AI also seems to have trouble with words. “Otter” is misspelled as “Oter” and “Land Rover” is now “Danover”.

See more

Moving forward, the company will be working with its “red teamers” who are a group of industry experts “to assess critical areas for harms or risks.” They want to make sure Sora doesn’t generate false information, hateful content, or have any bias. Additionally, OpenAI is going to implement a text classifier to reject prompts that violate their policy. These include inputs requesting sexual content, violent videos, and celebrity likenesses among other things.

No word on when Sora will officially launch. We reached out for info on the release. This story will be updated at a later time. In the meantime, check out TechRadar's list of the best AI video editors for 2024.

You might also like

TechRadar – All the latest technology news

Read More

Bing Chat can now create more realistic images thanks to DALL-E 3 AI upgrade

Bing Chat has received a substantial update, now integrating OpenAI’s most recent text-to-image model DALL-E 3. Best of all, it’s available to everyone for free.

As it’s laid out in Microsoft's announcement post, DALL-E 3 is a big upgrade to previous generations because it’s able to produce more “realistic and diverse images” thanks to improvements made in three areas. 

The AI is now able to adhere to a text prompt more closely than before when producing content. Microsoft recommends adding as much detail as possible to ensure the final image sticks close to your vision. Due to the extra precision, outputs will be more coherent or “logically consistent”. Sometimes creations from other models like Stable Diffusion look downright weird. Bing's new update improves on this front.

Also, tweaks were made to DALL-E 3 so it can accurately portray unique art styles that meet your standard of creativity, according to the company.

Image 1 of 4

Bing Chat DALL-E 3 generation

(Image credit: Future)
Image 2 of 4

Bing Chat DALL-E 3 generated hand

(Image credit: Future)
Image 3 of 4

Bing Chat DALL-E werewolf

(Image credit: Future)
Image 4 of 4

Pixel art parrot

(Image credit: Future)

Above are some samples we created ourselves to give you an idea of what the AI can now do. Using the generative engine is really simple. You can head on over to either Bing Chat or the Bing Image Creator website, enter a prompt in the text box, give it a few seconds, and you're done. It's just that easy.

Security upgrade

Besides the performance upgrade, Microsoft has added two security features to Bing Chat aimed at maintaining ethical usage. Every output will come with a Content Credential and an “invisible digital watermark” stating it was generated by Bing Image Creator as well as the date and time it was made.  

Content Credential notice

(Image credit: Future)

The company is also implementing a “content moderation system” to remove images deemed “harmful or inappropriate”. This includes content “that [contains] nudity, violence, hate speech, or illegal activities.” Something not mentioned is you can’t generate pictures featuring famous figures. We asked Bing to create something with President Joe Biden in it. But we were told we couldn’t as it violates the service’s policy. 

Work in progress

As impressive as Bing Chat is now, it is still a work in progress. Like other AI engines, Microsoft’s model still has difficulty drawing hands. It’s not as bad as when you had Stable Diffusion generating gnarled hands back in early 2023. However, you may notice an extra digit or two. In fact, the werewolf image above actually has five fingers on its right hand while it only has four on the left. 

Generated image of hands with an extra finger

(Image credit: Future)

We do want to warn you that you may experience some slowdown in AI image-generation output. We certainly did although Bing Chat picked up speed after a few minutes. In the worst case, the AI will refuse to do anything because it can't process new requests.

If you want to take generative AI on the go, be sure to check out TechRadar’s list of the four best art generator apps for iPhone

You might also like

TechRadar – All the latest technology news

Read More