Microsoft’s VASA-1 AI video generation system can make lifelike avatars that speak volumes from a single photo

AI-generated video is already a reality, and now another player has joined the fray: Microsoft. Apparently, the tech giant has developed a generative AI system that can whip up realistic talking avatars from a single picture and an audio clip. The tool is named VASA-1, and it goes beyond mimicking mouth movement; it can capture lifelike emotions and produce natural-looking movements as well.

The system offers its user the ability to modify the subject’s eye movements, the distance the subject is being perceived at, and the emotions expressed. VASA-1 is the first model in what is rumored to be a series of AI tools, and MSPowerUser reports that it can conjure up specific facial expressions, synchronize lip movements to a high degree, and produce human-like head motions. 

It can offer a wide range of emotions to choose from and generate facial subtleties, which sounds like it could make for a scarily convincing result. 

How VASA-1 works and what it's capable of

Seemingly taking a note from how human 3D animators and modelers work, VASA-1 makes use of a process it calls ‘disentanglement,’ allowing the system to control and edit the facial expressions, 3D head position, and facial features independently of each other, and this is what powers VASA-1’s realism.

As you might be imagining already, this has seismic potential, offering the possibility to totally change our experiences of digital apps and interfaces. According to MSPowerUser, VASA-1 can produce videos unlike those that it was trained on. Apparently, the system wasn’t trained on artistic photos, singing voices, or non-English speech, but if you request a video that features one of these, it’ll oblige. 

The Microsoft researchers behind VASA-1 praise its real-time efficiency, stating that the system can make fairly high-resolution videos (512×512 pixels) with high frame rates. Frame rate, or frames per second (fps), is the frequency at which a series of images (referred to as frames) can be captured or displayed in succession within a piece of media. The researchers claim that VASA-1 can generate videos with 45fps in offline mode, and 40fps with online generation. 

You can check out the state of VASA-1 and learn more about it on Microsoft’s dedicated webpage for the project. It has several demonstrations and includes links to download information about it, ending with a section headlined ‘Risks and responsible AI considerations.’

Works like magic – but is it a miracle spell or a recipe for disaster?

In this final reflective section, Microsoft acknowledges that a tool like this has plentiful scope for misuse, but the researchers try to emphasize the potential positives of VASA-1. They’re not wrong; a technology like this could mean next-level educational experiences that are available to more students than ever before, better assistance to people who have difficulties communicating, the capability to provide companionship, and improved digital therapeutic support. 

All of that said, it would be foolish to ignore the potential for harm and wrongdoing with something like this. Microsoft does state that it doesn’t currently have plans to make VASA-1 available in any form to the public until it’s reassured that “the technology will be used responsibly and in accordance with proper regulations.” If Microsoft sticks to this ethos, I think it could be a long wait. 

All in all, I think it’s becoming hard to deny that generative AI video tools are going to become more commonplace and the countdown to when they saturate our lives has begun. Google has been working on an analogous AI system with the moniker VLOGGER, and also recently put out a paper detailing how VLOGGER can create realistic videos of people moving, speaking, and gesturing with the input of a single photo. 

OpenAI also made headlines recently by introducing its own AI video generation tool, Sora, which can generate videos from text descriptions. OpenAI explained how Sora works on a dedicated page, and provided demonstrations that impressed a lot of people – and worried even more. 

I am wary of what these innovations will enable us to do, and I’m glad that, as far as we know, all three of these new tools are being kept tightly under wraps. I think realistically the best guardrails we have against the misuse of technologies like these are airtight regulations, but I’m doubtful that all governments will take these steps in time. 

YOU MIGHT ALSO LIKE…

TechRadar – All the latest technology news

Read More

Adobe’s new beta Express app gives you Firefly AI image generation for free

Adobe has released a new beta version of its Express app, letting users try out their Firefly generative AI on mobile for the first time.

The AI functions much like Firefly on the web since it has a lot of the same features. You can have the AI engine create images from a single text prompt, insert or remove objects from images, and add words with special effects. The service also offers resources like background music tracks, stock videos, and a content scheduler for posting on social media platforms. It’s important to mention that all these features and more normally require a subscription to Adobe Express Premium. But, according to the announcement, everything will be available for free while the beta is ongoing. Once it’s over, you’ll have to pay the $ 10-a-month subscription to keep using the tools 

Adobe Express with Firefly features

(Image credit: Adobe)

Art projects on the current Express app will not be found in the beta – at least not right now. Ian Wang, who is the vice president of product for Adobe Express, told The Verge that once Express with Firefly exits beta, all the “historical data from the old app” will carry over to the new one. 

The new replacement

Adobe is planning on making Express with Firefly the main platform moving forward. It’s unknown when the beta will end. A company representative couldn’t give us an exact date, but they told us the company is currently collecting feedback for the eventual launch. When the trial period ends, the representative stated, “All eligible devices will be automatically updated to the new [app]”.

We managed to gain access to the beta and the way it works is pretty simple. Upon installation, you’ll see a revolving carousel of the AI tools at the top. For this quick demo, we’ll have Firefly make an image from a text prompt. Tap the option, then enter whatever you want to see from the AI.

Adobe Express with Firefly demo

(Image credit: Future)

Give it a few seconds to generate the content where you’ll be given multiple pictures to choose from. From there, you edit the image to your liking. After you’re all done, you can publish the finished product on social media or share it with someone.

Availability

Android users can download the beta directly from the Google Play Store. iPhone owners, on the other hand, will have a harder time. Apple has restrictions on how many testers can have access to beta software at a time. iOS users will instead have to join Adobe’s waitlist first and wait to get chosen. If you’re one of the lucky few, the company will guide you through the process of installing the app on your iPhone.

There is a system requirements page listing all of the smartphones eligible for the beta, however, it doesn’t appear to be a super strict list. The device we used was a OnePlus Nord N20 and it ran the app just fine. Adobe’s website also has all the supported languages which include English, French, Korean, plus Brazilian Portuguese.

Check out TechRadar's list of the best photo editor for 2024 if you want more robust tools.

You might also like

TechRadar – All the latest technology news

Read More

Google explains how Gemini’s AI image generation went wrong, and how it’ll fix it

A few weeks ago Google launched a new image generation tool for Gemini (the suite of AI tools formerly known as Bard and Duet) which allowed users to generate all sorts of images from simple text prompts. Unfortunately, Google’s AI tool repeatedly missed the mark and generated inaccurate and even offensive images that led a lot of us to wonder – how did the bot get things so wrong? Well, the company has finally released a statement explaining what went wrong, and how it plans to fix Gemini. 

The official blog post addressing the issue states that when designing the text-to-image feature for Gemini, the team behind Gemini wanted to “ensure it doesn’t fall into some of the traps we’ve seen in the past with image generation technology — such as creating violent or sexually explicit images, or depictions of real people.” The post further explains that users probably don’t want to keep seeing people of just one ethnicity or other prominent characteristic. 

So, to offer a pretty basic explanation for what’s been going on: Gemini has been throwing up images of people of color when prompted to generate images of white historical figures, giving users ‘diverse Nazis’, or simply ignoring the part of your prompt where you’ve specified exactly what you’re looking for. While Gemini’s image capabilities are currently on hold, when you could access the feature you’d specify exactly who you’re trying to generate – Google uses the example “a white veterinarian with a dog” – and Gemini would seemingly ignore the first half of that prompt and generate veterinarians of all races except the one you asked for. 

Google went on to explain that this was the outcome of two crucial failings – firstly, Gemini was showing a range of different people without considering a range not to show. Alongside that, in trying to make a more conscious, less biased generative AI, Google admits the “model became way more cautious than we intended and refused to answer certain prompts entirely – wrongly interpreting some very anodyne prompts as sensitive.”

So, what's next?

At the time of writing, the ability to generate images of people on Gemini has been paused while the Gemini team works to fix the inaccuracies and carry out further testing. The blog post notes that AI ‘hallucinations’ are nothing new when it comes to complex deep learning models – even Bard and ChatGPT had some questionable tantrums as the creators of those bots worked out the kinks. 

The post ends with a promise from Google to keep working on Gemini’s AI-powered people generation until everything is sorted, with the note that while the team can’t promise it won’t ever generate “embarrassing, inaccurate or offensive results”, action is being taken to make sure it happens as little as possible. 

All in all, this whole episode puts into perspective that AI is only as smart as we make it. Our editor-in-chief Lance Ulanoff succinctly noted that “When an AI doesn't know history, you can't blame the AI.” With how quickly artificial intelligence has swooped in and crammed itself into various facets of our daily lives – whether we want it or not – it’s easy to forget that the public proliferation of AI started just 18 months ago. As impressive as the tools currently available to us are, we’re ultimately still in the early days of artificial intelligence. 

We can’t rain on Google Gemini’s parade just because the mistakes were more visually striking than say, ChatGPT’s recent gibberish-filled meltdown. Google’s temporary pause and reworking will ultimately lead to a better product, and sooner or later we’ll see the tool as it was meant to be. 

You might also like…

TechRadar – All the latest technology news

Read More