Forget Sora, Runway is the AI video maker coming to blow your mind

Artificial intelligence-powered video maker Runway has officially launched its new Gen-3 Alpha model after teasing its debut a few weeks ago. The Gen-3 Alpha video creator offers major upgrades in creating hyper-realistic videos from user prompts. It's a significant advancement over the Gen-2 model released early last year. 

Runway's Gen-3 Alpha is aimed at a range of content creators, including marketing and advertising groups. The startup claims to outdo any competition when it comes to handling complex transitions, as well as key-framing and human characters with expressive faces. The model was trained on a large video and image dataset annotated with descriptive captions, enabling it to generate highly realistic video clips. As of this writing, the company is not revealing the sources of its video and image datasets.

The new model is accessible to all users signed up on the RunwayML platform, but unlike Gen-1 and Gen-2, Gen-3 Alpha is not free. Users must upgrade to a paid plan, with prices starting at $ 12 per month per editor. This move suggests Runway is ready to professionalize its products after having the chance to refine them, thanks to all of the people playing with the free models. 

Initially, Gen-3 Alpha will power Runway's text-to-video mode, allowing users to create videos using natural language prompts. In the coming days, the model's capabilities will expand to include image-to-video and video-to-video modes. Additionally, Gen-3 Alpha will integrate with Runway's control features, such as Motion Brush, Advanced Camera Controls, and Director Mode.

Runway stated that Gen-3 Alpha is only the first in a new line of models built for large-scale multimodal training. The end goal is what the company calls “General World Models,” which will be capable of representing and simulating a wide range of real-world situations and interactions.

AI Video Race

The immediate question is whether Runway's advancements can meet or exceed what OpenAI is doing with its attention-grabbing Sora model. While Sora promises one-minute-long videos, Runway's Gen-3 Alpha currently supports video clips that are only up to 10 seconds long. Despite this limitation, Runway is betting on Gen-3 Alpha's speed and quality to set it apart from Sora, at least until it can augment the model as they have planned, making it capable of producing longer videos. 

The race isn't just about Sora. Stability AI, Pika, Luma Labs, and others are all eager to claim the title of best AI video creator. As the competition heats up, Runway's release of Gen-3 Alpha is a strategic move to assert a leading position in the market.

You might also like…

TechRadar – All the latest technology news

Read More

Sorry, Morgan Freeman is not narrating that viral TikTok video

Morgan Freeman's celebrated baritone has been repurposed for projects the actor has not approved, and he is not happy about it. Freeman called out those unauthorized artificial intelligence-fueled voice clones in a post on X, thanking supporters for alerting him and his management about where AI-replicated versions of his voice have appeared. 

“Thank you to my incredible fans for your vigilance and support in calling out the unauthorized use of an A.I. voice imitating me,” Freeman posted, adding hashtags such as “#scam,” “#imitation,” and “#IdentityProtection.”

Though Freeman doesn't cite it specifically, his post is likely a reference to a new video in a viral TikTok series where an ersatz version of his voice narrates the activities of his “niece,” TikTok user @justinescameraroll, known as Justine. 

@justinescameraroll

♬ water (instrumental) – no/vox & karaokey

Her “Day in the Life of a Nepo Niece” videos have collectively amassed over one million views from her 218,6,000 TikTok and 123,000 Instagram followers. She captioned her most recent post, which has the voice clone of Freeman narrating, with, “Uncle Mo has been booked and busy, but I finally got him to narrate my trip!” A post of the video shared on X then reached 16.4 million people and may have prompted Freeman's reaction based on the timing. 

Justine later confirmed in a follow-up video that her video did not feature Freeman's real voice, adding, “I was just having a little bit of fun.”

Famous Fakes

The iconic nature of Freeman's voice means there's a lot of interest in imitating it for everything from the social media videos mentioned to full film narrations. ElevenLabs made a voice specifically designed to imitate Freeman. For instance, though the documentary “The Power of Chi” lists Freeman as the narrator, and it's on IMDB that way, Freeman has never even mentioned it. Plus, his voice in the film sounds more than a little off, as you can hear in the link. He might just be phoning it in for a paycheck from the obscure documentary, or it might be AI.

Freeman is far from alone among celebrities concerned about how AI-created versions of their face or voice might be used without their permission. In May, actress Scarlett Johansson voiced her anger upon discovering an OpenAI chatbot that sounded disturbingly similar to her voice. Johansson, who played an AI assistant in the 2013 film Her, found the situation particularly unsettling. OpenAI responded by announcing plans to discontinue the use of the ChatGPT voice that resembled Johansson's, though without admitting any fault.

The same goes for videos, which use deepfakes of celebrities to try and trick people into thinking the famous person endorsed the scam. Tom Hanks has had to alert fans about a deepfake video of himself on social media. So has trusted British consumer advice guide Martin Lewis, who warned of a deepfake video attempting to trick people into sending money for a scam investment.

The rapid advancement of AI has outpaced regulatory measures, leading to situations where individuals' voices and likenesses can be replicated without consent. The concern over AI-generated imitations is not limited to actors. AI music creation startups Suno and Udio are facing a lawsuit from the Recording Industry Association of America (RIAA) and major music labels for copyright infringement.  

You might also like…

TechRadar – All the latest technology news

Read More

Runway’s new OpenAI Sora rival shows that AI video is getting frighteningly realistic

Just a week on from arrival of Luma AI's Dream Machine, another big OpenAI Sora has just landed – and Runway's latest AI video generator might be the most impressive one yet.

Runway was one of the original text-to-video pioneers, launching its Gen-2 model back in March 2023. But its new Gen-3 Alpha model, which will apparently be “available for everyone over the coming days”, takes things up several notches with new photo-realistic powers and promises of real-world physics.

The demo videos (which you can see below) showcase how versatile Runway's new AI model is, with the clips including realistic human faces, drone shots, simulations of handheld cameras and atmospheric dreamscapes. Runway says that all of them were generated with Gen-3 Alpha “with no modifications”.

Apparently, Gen-3 Alpha is also “the first of an upcoming series of models” that have been trained “on a new infrastructure built for large-scale multimodal training”. Interestingly, Runway added that the new AI tool “represents a significant step towards our goal of building General World Models”, which could create possibilities for gaming and more.

A 'General World Model' is one that effectively simulates an environment, including its physics – which is why one of the sample videos shows the reflections on a woman's face as she looks through a train window.

These tools won't just be for us to level-up our GIF games either – Runway says it's “been collaborating and partnering with leading entertainment and media organizations to create custom versions of Gen-3 Alpha”, which means tailored versions of the model for specific looks and styles. So expect to see this tech powering adverts, shorts and more very soon.

When can you try it?

A middle-aged sad bald man becomes happy as a wig of curly hair and sunglasses fall suddenly on his head

(Image credit: Runway)

Last week, Luma AI's Dream Machine arrived to give us a free AI video generator to dabble with, but Runway's Gen-3 Alpha model is more targeted towards the other end of the AI video scale. 

It's been developed in collaboration with pro video creators with that audience in mind, although Runway says it'll be “available for everyone over the coming days”. You can create a free account to try Runway's AI tools, though you'll need to pay a monthly subscription (starting from $ 12 per month, or around £10 / AU$ 18 a month) to get more credits.

You can create videos using text prompts – the clip above, for example, was made using the prompt “a middle-aged sad bald man becomes happy as a wig of curly hair and sunglasses fall suddenly on his head”. Alternatively, you can use still images or videos as a starting point.

The realism on show is simultaneously impressive and slightly terrifying, but Runway states that the model will be released with a new set of safeguards against misuse, including an “in-house visual moderation system” and C2PA (Coalition for Content Provenance and Authenticity) provenance standards. Let the AI video battles commence.

You might also like…

TechRadar – All the latest technology news

Read More

The TikTok of AI video? Kling AI is a scarily impressive new OpenAI Sora rival

It feels like we're at a tipping point for AI video generators, and just a few months on from OpenAI's Sora taking social media by storm with its text-to-video skills, a new Chinese rival is taking social media by storm.

Called Kling AI, the new “video generation model” is made by the Chinese TikTok rival Kuaishou, and it's currently only available as a public demo in China via a waitlist. But that hasn't stopped it from quickly going viral, with some impressive clips that suggest it's at least as capable as Sora.

You can see some of the early demo videos (like the one below) on the Kling AI website, while a number of threads on X (formerly Twitter) from the likes of Min Choi (below) have rounded up what are claimed to be some impressive early creations made by the tool (with some help from editing apps).

A blue parrot turning its head

(Image credit: Kling AI)

As always, some caution needs to be applied with these early AI-generated clips, as they're cherry-picked examples, and we don't yet know anything about the hardware or other software that's been used to create them. 

For example, we later found that an impressive Air Head video seemingly made by OpenAI's Sora needed a lot of extra editing in post-production.

See more

Still, those caveats aside, Kling AI certainly looks like another powerful AI video generator. It lets early testers create 1080/30p videos that are up to two minutes in length. The results, while still carrying some AI giveaways like smoothing and minor artifacts, are impressively varied, with a promising amount of coherence.

Exactly how long it'll be before Kling AI is opened up to users outside China remains to be seen. But with OpenAI suggesting that Sora will get a public release “later this year”, Kling AI best not wait too long if it wants to become the TikTok of AI-generated video.

The AI video war heats up

Now that AI photo tools like Midjourney and Adobe Firefly are hitting the mainstream, it's clear that video generators are the next big AI battleground – and that has big implications for social media, the movie industry, and our ability to trust what we see during, say, major election campaigns.

Other examples of AI generators include Google Veo, Microsoft's VASA-1 (which can make lifelike talking avatars from a single photo), Runway Gen-2, and Pika Labs. Adobe has now even showed how it could soon integrate many of these tools into Premiere Pro, which would be give the space another big boost.

None of them are yet perfect, and it isn't clear how long it takes to produce a clip using the likes of Sora or Kling AI, nor what kind of computing power is needed. But the leaps being made towards photorealism and simulating real-world physics have been massive in the past year, so it clearly won't be long before these tools hit the mainstream.

That battle will become an international one, too – with the US still threatening a TikTok ban, expect there to be a few more twists and turns before the likes of Kling AI roll out worldwide. 

You might also like…

TechRadar – All the latest technology news

Read More

Google Search is getting a massive upgrade – including letting you search with video

Google I/O 2024's entire two-hour keynote was devoted to Gemini. Not a peep was uttered for the recently launched Pixel 8a or what Android 15 is bringing upon release. The only times a smartphone or Android was mentioned is how they are being improved by Gemini

The tech giant is clearly going all-in on the AI, so much so that the stream concludes by boldly displaying the words “Welcome to the Gemini era”. 

Among all the updates that were presented at the event, Google Search is slated to gain some of the more impressive changes. You could even argue that the search engine will see one of the most impactful upgrades in 2024 that it’s ever received in its 25 years as a major tech platform. Gemini gives Google Search a huge performance boost, and we can’t help but feel excited about it.

Below is a quick rundown of all the new features Google Search will receive this year.

1. AI Overviews

Google IO 2024

(Image credit: Google)

The biggest upgrade coming to the search engine is AI Overviews which appears to be the launch version of SGE (Search Generative Experience). It provides detailed, AI-generated answers to inquiries. Responses come complete with contextually relevant text as well as links to sources and suggestions for follow-up questions.

Starting today, AI Overviews is leaving Google Labs and rolling out to everyone in the United States as a fully-fledged feature. For anyone who used the SGE, it appears to be identical. 

Response layouts are the same and they’ll have product links too. Google has presumably worked out all the kinks so it performs optimally. Although when it comes to generative AI, there is still the chance it could hallucinate.

There are plans to expand AI Overviews to more countries with the goal of reaching over a billion people by the end of 2024. Google noted the expansion is happening “soon,” but an exact date was not given.

2. Video Search

Google IO 2024

(Image credit: Google)

AI Overviews is bringing more to Google Search than just detailed results. One of the new features allows users to upload videos to the engine alongside a text inquiry. At I/O 2024, the presenter gave the example of purchasing a record player with faulty parts. 

You can upload a clip and ask the AI what's wrong with your player, and it’ll provide a detailed answer mentioning the exact part that needs to be replaced, plus instructions on how to fix the problem. You might need a new tone arm or a cueing lever, but you won't need to type in a question to Google to get an answer. Instead you can speak directly into the video and send it off.

Searching With Video will launch for “Search Labs users in English in the US,” soon with plans for further expansion into additional regions over time. 

3. Smarter AI

Google IO 2024

(Image credit: Google)

Next, Google is introducing several performance boosts; however, none of them are available at the moment. They’ll be rolling out soon to the Search Labs program exclusively to people in the United States and in English. 

First, you'll be able to click one of two buttons at the top to simplify an AI Overview response or ask for more details. You can also choose to return to the original answer at any time.

Second, AI Overviews will be able to understand complex questions better than before. Users won’t have to ask the search engine multiple short questions. Instead, you can enter one long inquiry – for example, a user can ask it to find a specific yoga studio with introductory packages nearby.

Lastly, Google Search can create “plans” for you. This can be either a three-day meal plan that’s easy to prepare or a vacation itinerary for your next trip. It’ll provide links to the recipes plus the option to replace dishes you don't like. Later down the line, the planning tool will encompass other topics like movies, music, and hotels.

All about Gemini

That’s pretty much all of the changes coming to Google Search in a nutshell. If you’re interested in trying these out and you live in the United States, head over to the Search Labs website, sign up for the program, and give the experimental AI features a go. You’ll find them near the top of the page.

Google I/O 2024 dropped a ton of information on the tech giant’s upcoming AI endeavors. Project Astra, in particular, looked very interesting, as it can identify objects, code on a monitor, and even pinpoint the city you’re in just by looking outside a window. 

Ask Photos was pretty cool, too, if a little freaky. It’s an upcoming Google Photos tool capable of finding specific images in your account much faster than before and “handle more in-depth queries” with startling accuracy.

If you want a full breakdown, check out TechRadar's list of the seven biggest AI announcements from Google I/O 2024.

You might also like

TechRadar – All the latest technology news

Read More

OpenAI’s Sora just made another brain-melting music video and we’re starting to see a theme

OpenAI's text-to-video tool has been a busy bee recently, helping to make a short film about a man with a balloon for a head and giving us a glimpse of the future of TED Talks – and now it's rustled up its first official music video for the synth-pop artist Washed Out (below).

This isn't the first music video we've seen from Sora – earlier this month we saw this one for independent musician August Kamp – but it is the first official commissioned example from an established music video director and artist.

That director is Paul Trillo, an artist who's previously made videos for the likes of The Shins and shared this new one on X (formerly Twitter). He said the video, which flies through a tunnel-like collage of high school scenes, was “an idea I had almost 10 years ago and then abandoned”, but that he was “finally able to bring it to life” with Sora.

It isn't clear exactly why Sora was an essential component for executing a fairly simple concept, but it helped make the process much simpler and quicker. Trillo points to one of his earlier music videos, The Great Divide for The Shins, which uses a similar effect but was “entirely 3D animated”.

As for how this new Washed Out video was made, it required less non-Sora help than the Shy Kids' Air Head video, which involved some lengthy post-production to create the necessary camera effects and consistency. For this one, Trillo said he used text-to-video prompts in Sora, then cut the resulting 55 clips together in Premiere Pro with only “very minor touch-ups”.

The result is a video that, like Sora's TED Talks creation (which was also created by Trillo), hints at the tool's strengths and weaknesses. While it does show that digital special effects are going to be democratized for visual projects with tight budgets, it also reveals Sora's issues with coherency across frames (as characters morph and change) and its persistent sense of uncanny valley.

Like the TED Talks video, a common technique to get around these limitations is the dreamy fly-through technique, which ensures that characters are only on-screen fleetingly and that any weird morphing is a part of the look rather than a jarring mistake. While it works for this video, it could quickly become a trope if it's over-used.

A music video tradition

Two people sitting on the top deck of a bus

(Image credit: OpenAI / Washed Out)

Music videos have long been pioneers of new digital technology – the Dire Straits video for Money For Nothing in 1985, for example, gave us an early taste of 3D animation, while Michael Jackson's Black Or White showed off the digital morphing trick that quickly became ubiquitous in the early 90s (see Terminator 2: Judgement Day). 

While music videos lack the cultural influence they once did, it looks like they'll again be a playground for AI-powered effects like the ones in this Washed Out creation. That makes sense because Sora, which OpenAI expects to release to the public “later this year”, is still well short of being good enough to be used in full-blown movies.

We can expect to see these kinds of effects everywhere by the end of the year, from adverts to TikTok promos. But like those landmark effects in earlier music videos, they will also likely date pretty quickly and become visual cliches that go out of fashion.

If Sora can develop at the same rate as OpenAI's flagship tool, ChatGPT, it could evolve into something more reliable, flexible, and mainstream – with Adobe recently hinting that the tool could soon be a plug-in for Adobe Premiere Pro. Until then, expect to see a lot more psychedelic Sora videos that look like a mashup of your dreams (or nightmares) from last night.

You might also like…

TechRadar – All the latest technology news

Read More

Turns out the viral ‘Air Head’ Sora video wasn’t purely the work of AI we were led to believe

A new interview with the director behind the viral Sora clip Air Head has revealed that AI played a smaller part in its production than was originally claimed. 

Revealed by Patrick Cederberg (who did the post-production for the viral video) in an interview with Fxguide, it has now been confirmed that OpenAI's text-to-video program was far from the only force involved in its production. The 1-minute and 21-second clip was made with a combination of traditional filmmaking techniques and post-production editing to achieve the look of the final picture.

Air Head was made by ShyKids and tells the short story of a man with a literal balloon for a head. While there's human voiceover utilized, from the way OpenAI was pushing the clip on social channels such as YouTube, it certainly left the impression that the visuals were was purely powered by AI, but that's not entirely true. 

As revealed in the behind-the-scenes clip, a ton of work was done by ShyKids who took the raw output from Sora and helped to clean it up into the finished product. This included manually rotoscoping the backgrounds, removing the faces that would occasionally appear on the balloons, and color correcting. 

Then there's the fact that Sora takes a ton of time to actually get things right. Cederberg explains that there were “hundreds of generations at 10 to 20 seconds a piece” which were then tightly edited in what the team described as a “300:1” ratio of what was generated versus what was primed for further touch-ups. 

Such manual work also included editing out the head which would appear and reappear, and even changing the color of the balloon itself which would appear red instead of yellow. While Sora was used to generate the initial imagery with good results, there was clearly a lot more happening behind the scenes to make the finished product look as good as it does, so we're still a long way out from instantly-generated movie-quality productions. 

Sora remains tightly under wraps save for a handful of carefully curated projects that have been allowed to surface, with Air Head among the most popular. The clip has over 120,000 views at the time of writing, with OpenAI touting as “experimentation” with the program, downplaying the obvious work that went into the final product. 

Sora is impressive but we're not convinced

While OpenAI has done a decent job of showcasing what its text-to-video service can do through the large language model, the lack of transparency is worrying. 

Air Head is an impressive clip by a talented team, but it was subject to a ton of editing to get the final product to where it is in the short. 

It's not quite the one-click-and you-'re-done approach that many of the tech's boosters have represented it as. It turns out that it is merely a tool which could be used to enhance imagery instead of create from scratch, which is something that is already common enough in video production, making Sora seem less revolutionary than it first appeared.

You may also like

TechRadar – All the latest technology news

Read More

Adobe’s next big project is an AI that can upscale low-res video to 8x its original quality

A group of Adobe researchers recently published a paper on a new generative AI model called VideoGigaGAN and we believe it may launch on a future product. What it does is upscale low-quality videos by up to eight times their original resolution without sacrificing stability or important aspects of the source material. Several demo clips can be found on the project’s website showing off its abilities. It can turn a blurry 128×128 pixel resolution video of a waterfall into footage running at a resolution of 1,024×1,024 pixels.

Post by @luokai
View on Threads

What’s noteworthy about the AI is it doesn’t skimp out on the finer details. Skin texture, wrinkles, strands of hair, and more are visible on the faces of human subjects. The other demos also feature a similar level of quality. You can better make out a swan swimming in a pond and the blossom on a tree thanks to this tech. It may seem bizarre to be focusing so much on skin wrinkles or feathers. However, it is this level of detail that companies like Adobe must nail down if they aim to implement image-enhancing AI on a wide scale.

Improving AI

You probably have a couple of questions about the platform’s latest project like how does it work? Well, it’s complicated. 

The “GAN” in VideoGigaGAN stands for generative adversarial network, a type of AI capable of creating realistic images. Adobe’s version is specifically based on GigaGAN which specializes in upscaling generated content as well as real photos. The problem with this tech, as TheVerge points out, is that it can’t improve the quality of videos without having multiple problems crop up like weird artifacts. To solve this issue, Adobe researchers used a variety of techniques.

The research paper explains the whole process. You can read it yourself to get the full picture although it is dense material. Basically, they introduced a “flow-guided propagation module” to ensure consistency among a video’s frames, anti-aliasing to reduce artifacts, and a “high-frequency feature shuttle” to make up for sudden drops in detail. There is more to VideoGigaGAN than what we just described, but that’s the gist of it.

Potential inclusion

Will we see this on an upcoming Adobe product or roll out as a standalone app? Most likely – at least we think so. 

In the past year, the company has been focusing heavily on implementing artificial intelligence into its software from the launch of Firefly to Acrobat’s new assistant. A few months ago during Adobe MAX 2023, a video upscaler referred to as Project Res Up was previewed at the event and its performance resembles what we see in the VideoGigaGAN demos. An old movie from the 1940s goes from running at a 480 x 360 image resolution to a crisp 1,280 x 960. Blurry footage of an elephant in a river becomes crystal clear. The presenter even mentions how the software can upscale a clip to four times the original quality. 

Admittedly, this is conjecture, but it’s entirely possible VideoGigaGAN may be the engine behind Res-Up. Adobe’s future product could give people a way to upscale old family videos or low-quality footage into the movie we envision in our minds. Perhaps, the recent preview is a hint at an imminent release.

VideoGigaGAN is still deep in development so it’s unknown when or if it’ll come out. There are several obstacles in the way. The AI can’t properly process videos beyond 200 frames or render small objects, but we'll definitely be keeping an eye on it.

In the meantime, check out TechRadar's list of the best AI image upscalers for 2024.

You might also like

TechRadar – All the latest technology news

Read More

OpenAI’s new Sora video is an FPV drone ride through the strangest TED Talk you’ve ever seen – and I need to lie down

OpenAI's new Sora text-to-video generation tool won't be publicly available until later this year, but in the meantime it's serving up some tantalizing glimpses of what it can do – including a mind-bending new video (below) showing what TED Talks might look like in 40 years.

To create the FPV drone-style video, TED Talks worked with OpenAI and the filmmaker Paul Trillo, who's been using Sora since February. The result is an impressive, if slightly bewildering, fly-through of futuristic conference talks, weird laboratories and underwater tunnels.

The video again shows both the incredible potential of OpenAI Sora and its limitations. The FPV drone-style effect has become a popular one for hard-hitting social media videos, but it traditionally requires advanced drone piloting skills and expensive kit that goes way beyond the new DJI Avata 2.

Sora's new video shows that these kind of effects could be opened up to new creators, potentially at a vastly lower cost – although that comes with the caveat that we don't yet know how much OpenAI's new tool itself will cost and who it'll be available to.

See more

But the video (above) also shows that Sora is still quite far short of being a reliable tool for full-blown movies. The people in the shots are on-screen for only a couple of seconds and there's plenty of uncanny valley nightmare fuel in the background.

The result is an experience that's exhilarating, while also leaving you feeling strangely off-kilter – like touching down again after a sky dive. Still, I'm definitely keen to see more samples as we hurtle towards Sora's public launch later in 2024.

How was the video made?

A video created by OpenAI Sora for TED Talks

(Image credit: OpenAI / TED Talks)

OpenAI and TED Talks didn't go into detail about how this specific video was made, but its creator Paul Trillo recently talked more broadly about his experiences of being one of Sora's alpha tester.

Trillo told Business Insider about the kinds of prompts he uses, including “a cocktail of words that I use to make sure that it feels less like a video game and something more filmic”. Apparently these include prompts like “35 millimeter”, “anamorphic lens”, and “depth of field lens vignette”, which are needed or else Sora will “kind of default to this very digital-looking output”.

Right now, every prompt has to go through OpenAI so it can be run through its strict safeguards around issues like copyright. One of Trillo's most interesting observations is that Sora is currently “like a slot machine where you ask for something, and it jumbles ideas together, and it doesn't have a real physics engine to it”.

This means that it's still a long way way off from being truly consistent with people and object states, something that OpenAI admitted in an earlier blog post. OpenAI said that Sora “currently exhibits numerous limitations as a simulator”, including the fact that “it does not accurately model the physics of many basic interactions, like glass shattering”.

These incoherencies will likely limit Sora to being a short-form video tool for some time, but it's still one I can't wait to try out.

You might also like

TechRadar – All the latest technology news

Read More

Microsoft’s VASA-1 AI video generation system can make lifelike avatars that speak volumes from a single photo

AI-generated video is already a reality, and now another player has joined the fray: Microsoft. Apparently, the tech giant has developed a generative AI system that can whip up realistic talking avatars from a single picture and an audio clip. The tool is named VASA-1, and it goes beyond mimicking mouth movement; it can capture lifelike emotions and produce natural-looking movements as well.

The system offers its user the ability to modify the subject’s eye movements, the distance the subject is being perceived at, and the emotions expressed. VASA-1 is the first model in what is rumored to be a series of AI tools, and MSPowerUser reports that it can conjure up specific facial expressions, synchronize lip movements to a high degree, and produce human-like head motions. 

It can offer a wide range of emotions to choose from and generate facial subtleties, which sounds like it could make for a scarily convincing result. 

How VASA-1 works and what it's capable of

Seemingly taking a note from how human 3D animators and modelers work, VASA-1 makes use of a process it calls ‘disentanglement,’ allowing the system to control and edit the facial expressions, 3D head position, and facial features independently of each other, and this is what powers VASA-1’s realism.

As you might be imagining already, this has seismic potential, offering the possibility to totally change our experiences of digital apps and interfaces. According to MSPowerUser, VASA-1 can produce videos unlike those that it was trained on. Apparently, the system wasn’t trained on artistic photos, singing voices, or non-English speech, but if you request a video that features one of these, it’ll oblige. 

The Microsoft researchers behind VASA-1 praise its real-time efficiency, stating that the system can make fairly high-resolution videos (512×512 pixels) with high frame rates. Frame rate, or frames per second (fps), is the frequency at which a series of images (referred to as frames) can be captured or displayed in succession within a piece of media. The researchers claim that VASA-1 can generate videos with 45fps in offline mode, and 40fps with online generation. 

You can check out the state of VASA-1 and learn more about it on Microsoft’s dedicated webpage for the project. It has several demonstrations and includes links to download information about it, ending with a section headlined ‘Risks and responsible AI considerations.’

Works like magic – but is it a miracle spell or a recipe for disaster?

In this final reflective section, Microsoft acknowledges that a tool like this has plentiful scope for misuse, but the researchers try to emphasize the potential positives of VASA-1. They’re not wrong; a technology like this could mean next-level educational experiences that are available to more students than ever before, better assistance to people who have difficulties communicating, the capability to provide companionship, and improved digital therapeutic support. 

All of that said, it would be foolish to ignore the potential for harm and wrongdoing with something like this. Microsoft does state that it doesn’t currently have plans to make VASA-1 available in any form to the public until it’s reassured that “the technology will be used responsibly and in accordance with proper regulations.” If Microsoft sticks to this ethos, I think it could be a long wait. 

All in all, I think it’s becoming hard to deny that generative AI video tools are going to become more commonplace and the countdown to when they saturate our lives has begun. Google has been working on an analogous AI system with the moniker VLOGGER, and also recently put out a paper detailing how VLOGGER can create realistic videos of people moving, speaking, and gesturing with the input of a single photo. 

OpenAI also made headlines recently by introducing its own AI video generation tool, Sora, which can generate videos from text descriptions. OpenAI explained how Sora works on a dedicated page, and provided demonstrations that impressed a lot of people – and worried even more. 

I am wary of what these innovations will enable us to do, and I’m glad that, as far as we know, all three of these new tools are being kept tightly under wraps. I think realistically the best guardrails we have against the misuse of technologies like these are airtight regulations, but I’m doubtful that all governments will take these steps in time. 

YOU MIGHT ALSO LIKE…

TechRadar – All the latest technology news

Read More