speak Archives - Shapiro Consultants

Microsoft’s VASA-1 AI video generation system can make lifelike avatars that speak volumes from a single photo

April 22, 2024
No Comments

AI-generated video is already a reality, and now another player has joined the fray: Microsoft. Apparently, the tech giant has developed a generative AI system that can whip up realistic talking avatars from a single picture and an audio clip. The tool is named VASA-1, and it goes beyond mimicking mouth movement; it can capture lifelike emotions and produce natural-looking movements as well.

The system offers its user the ability to modify the subject’s eye movements, the distance the subject is being perceived at, and the emotions expressed. VASA-1 is the first model in what is rumored to be a series of AI tools, and MSPowerUser reports that it can conjure up specific facial expressions, synchronize lip movements to a high degree, and produce human-like head motions.

It can offer a wide range of emotions to choose from and generate facial subtleties, which sounds like it could make for a scarily convincing result.

How VASA-1 works and what it's capable of

Seemingly taking a note from how human 3D animators and modelers work, VASA-1 makes use of a process it calls ‘disentanglement,’ allowing the system to control and edit the facial expressions, 3D head position, and facial features independently of each other, and this is what powers VASA-1’s realism.

As you might be imagining already, this has seismic potential, offering the possibility to totally change our experiences of digital apps and interfaces. According to MSPowerUser, VASA-1 can produce videos unlike those that it was trained on. Apparently, the system wasn’t trained on artistic photos, singing voices, or non-English speech, but if you request a video that features one of these, it’ll oblige.

The Microsoft researchers behind VASA-1 praise its real-time efficiency, stating that the system can make fairly high-resolution videos (512×512 pixels) with high frame rates. Frame rate, or frames per second (fps), is the frequency at which a series of images (referred to as frames) can be captured or displayed in succession within a piece of media. The researchers claim that VASA-1 can generate videos with 45fps in offline mode, and 40fps with online generation.

You can check out the state of VASA-1 and learn more about it on Microsoft’s dedicated webpage for the project. It has several demonstrations and includes links to download information about it, ending with a section headlined ‘Risks and responsible AI considerations.’

Works like magic – but is it a miracle spell or a recipe for disaster?

In this final reflective section, Microsoft acknowledges that a tool like this has plentiful scope for misuse, but the researchers try to emphasize the potential positives of VASA-1. They’re not wrong; a technology like this could mean next-level educational experiences that are available to more students than ever before, better assistance to people who have difficulties communicating, the capability to provide companionship, and improved digital therapeutic support.

All of that said, it would be foolish to ignore the potential for harm and wrongdoing with something like this. Microsoft does state that it doesn’t currently have plans to make VASA-1 available in any form to the public until it’s reassured that “the technology will be used responsibly and in accordance with proper regulations.” If Microsoft sticks to this ethos, I think it could be a long wait.

All in all, I think it’s becoming hard to deny that generative AI video tools are going to become more commonplace and the countdown to when they saturate our lives has begun. Google has been working on an analogous AI system with the moniker VLOGGER, and also recently put out a paper detailing how VLOGGER can create realistic videos of people moving, speaking, and gesturing with the input of a single photo.

OpenAI also made headlines recently by introducing its own AI video generation tool, Sora, which can generate videos from text descriptions. OpenAI explained how Sora works on a dedicated page, and provided demonstrations that impressed a lot of people – and worried even more.

I am wary of what these innovations will enable us to do, and I’m glad that, as far as we know, all three of these new tools are being kept tightly under wraps. I think realistically the best guardrails we have against the misuse of technologies like these are airtight regulations, but I’m doubtful that all governments will take these steps in time.

YOU MIGHT ALSO LIKE…

TechRadar – All the latest technology news

July 13, 2023
No Comments

Google Bard is saying its first words thanks to a recent update that gives the AI the ability to read out generated responses in over 40 different languages. The newfound language support includes Arabic, Chinese, German, and Spanish.

According to Google, being able to hear text out loud can be helpful in learning the correct pronunciation of words. Activating Bard’s speech tool is pretty simple. All you have to do after entering a prompt is select the sound icon in the upper right-hand corner of a response. In addition to the voice, Google is expanding Bard’s availability to more global regions, most notably Brazil and Europe. It's important to point out the European Union initially forced the tech giant to postpone the chatbot’s launch “over privacy concerns”. But it looks like everything has been squared away with the EU.

Also, users can now adjust the “tone and style of Bard’s responses [across] five different options: simple, long, short, professional, or casual.” Google says this can be helpful in creating marketplace listings for businesses that want to maintain a certain voice. It’s reminiscent of the tone parameters on Microsoft’s SwiftKey app. However, unlike SwiftKey, it doesn’t appear Bard will make any cringe dad jokes if you ask it (shame).

Productivity boost

There is more to the update than just the language features. Google is also introducing some productivity tools. First, users can now finally pin Bard conversations in case they ever want to revisit them at a later time. If the AI gives you some helpful information, you can share the response with friends via shareable links. The chatbot creates a hyperlink that you can send over a messaging app or you can directly post the URL to LinkedIn, Facebook, Twitter, or Reddit.

Google is aware that people use “Bard for coding tasks.” To help these programmers, the company is adding a direct to “export Python code to [the] Replit” platform. Lastly, the chatbot is gaining the “capabilities of Google Lens” meaning you’ll be able to “upload images with prompts” to the AI. Bard will then analyze the photograph before providing the information you seek. This last feature can be found behind the Plus symbol next to the Prompt bar.

The addition of Google Lens to Bard is pretty exciting as the chatbot can now serve as a reverse image search engine of sorts.

Availability

Most of the update is currently online in the 40 different languages mentioned earlier but with a couple of exceptions. The five tones and Google Lens support can only be found in the English version of Bard. There are plans, however, to “expand to new languages soon.”

Although Google Bard managed to finally debut in the European Union, Canada remains absent from the list of countries supporting the chatbot. VPNs fortunately allow Canadians to bypass the block. If this affects you, be sure to check out TechRadar’s list of the best VPN service for Canada in 2023.

TechRadar – All the latest technology news

December 14, 2021
No Comments

With iOS 15.2 and macOS 12.1 Monterey available to all, users can download the updates to their devices, alongside updating any existing apps to take advantage of what these updates bring.

Apple’s App Store Awards are the company’s way of highlighting developers who created apps that stood out in their category compared to other apps, in a theme of what Apple calls, connections. Whether that’s in video editing, streaming or games, they take advantage of recent features brought out by Apple and its software in an innovative way.

The company announced the year’s winners this month, with Carrot Weather, LumaFusion, DAZN, and League of Legends each winning in their categories for certain devices.

We spoke to the developers behind these apps to find out the challenges in designing the apps and their plans for the future.

Reflecting on their past

Every developer received an award that mirrors the App Store logo and its icon made from 100% recycled aluminum. During a video announcing the winners, Tim Cook, Apple’s CEO said, “From self-taught indie coders to inspiring leaders building global businesses, these standout developers innovated with Apple technology, with many helping to foster the profound sense of togetherness we needed this year.”

First up, LumaFusion is a video editing app on the iPad and iPhone for $ 19.99 / £19.99 / AU$ 19.99, that allows you to edit multiple videos at once with transitions and features that make it easy to turn a video into an engaging narrative. Its improvements this year made it the winner of the iPad app of the year award.

LumaFusion’s developers, Terri Morgan and Chris Demiris try to approach every release, whether that’s hardware or software, in how its users can benefit from their improvements in the app. “We couldn’t imagine where the iPad would go after ten years. Now, with Thunderbolt support and the M1 chip, we always see how we can adapt these updates into LumaFusion. Some of these features are easy to implement, such as ProRes and external storage support, but we're always looking to see which features would benefit users most.”

Morgan explains. “We’re inspired by how so many have used the app to help follow their passion, especially during lockdown, and it does help drive us to make the app even better, and more widely available to other users on Apple devices.”

LumaFusion on iPad Pro — (Image credit: Lumafusion)

League of Legends: Wild Rift was the winner of the iPhone gaming app of the year. Made by Riot Games, it's one of few franchises that's successfully made the jump from a console game to mobile with no compromises.

Michael Chow, executive producer on the game, reflected on developing the game since the start. “Usually when a game makes the move to mobile, there's a lot of negativity from their communities, so we wanted to make sure we avoided it with Wild Rift.”

With our positive impressions of running Rift on an iPhone 12 Pro earlier this year, we wanted to know how Chow and the team felt about releasing the game after a very long beta-test period.

“We’ve spent the past year rolling out the game across the world, and the results are pretty stellar,” Chow exclaims. “It’s not been an easy journey, as we weren’t sure if it was physically possible to bring League of Legends to mobile, but the results speak for themselves.”

“We quit our day jobs to start the company, and with Apple’s relentless efforts to make the iPhone better, it couldn’t make us more proud to receive this award from the company.”

League of Legends: Wild Rift on an iPhone 12 Pro — (Image credit: Future)

DAZN is a streaming app for sports, and while it’s additionally available on iPhone and iPad, it’s available for AppleTV for which DAZN won the app award for. It allows subscribed users to watch sports such as MotoGP, UFC, UEFA, NFL and more for $ 19.99 / £19.99 / AU$ 19.99 a month.

Ben King, Director of DAZN at DAZN Group, explained to us that the aim of the app was to make it accessible, flexible and affordable to those who just wanted to easily access their sports for a price that didn’t lock them into two-year contracts.

“We’re absolutely honoured to receive this award from Apple, but it doesn’t mean we want to stop with how we can offer content to our users in way of features and more kinds of sports.”

The app uses push notifications for the latest updates in other matches, such as red cards and goals, while you’re using another app. You can also watch three consecutive sports or games at once, mirroring a scene in Back to the Future Part II when Marty Jr would watches 16 channels at once.

DAZN on Apple TV, Mac and iPad. — (Image credit: DAZN)

Brian Mueller, is the developer of Carrot Weather, and won the 2021 App Award for the Apple Watch. Its complications to allow certain weather forecasts on watch faces, alongside its push notifications for upcoming weather changes, has allowed Mueller to bring the app, and its sass to the watch with no compromise.

“When the app launched in 2015, it was purely an entertainment app, with its achievements and Carrot’s personality,“ Mueller explained. “It wasn’t until the Apple Watch arrived that forced me to focus on making a really great weather app, instead of relying on Carrot’s jokes and the bizarre imagery.”

As the app grew since watchOS 2, Mueller realised that he could add more complications to the watch faces. “I found out a workaround in early versions that could allow me to add more than the one complication per watch face that the operating system allowed.” Mueller reveals. “ After this, users were asking me for certain weather sources to add to the watch faces, and I still love that, that fans of the app are giving me feedback to make the watch app better.”

Three variations of CARROT on an iPhone 12 — (Image credit: TechRadar)

Where next for these apps?

While these developers are celebrating their success, they aren't stopping. We asked what’s coming up for their apps in the near future.

“We have a long list of feature requests, and in the past there's been features such as CoreML and smart background removal. But we have to pick and choose each time to really focus on how they best fit for LumaFusion.” Morgan explains. “I can see us doing cooperative editing with SharePlay eventually, but in the immediate future, key-frame easing where you can bring in images to videos, alongside subtitling and speed ramping are all coming soon.”

We also wanted to know whether there were plans for LumaFusion coming to macOS natively. “While you can export a project to Final Cut, we’re aware that there’s a need for LumaFusion on macOS.” Demiris explains. “We are working on a more complete version for macOS to take advantage of what the Mac brings.”

A screenshot showing LumaFusion — (Image credit: Luma Touch LLC)

With League of Legends: Wild Rift, Chow was enthusiastic about how the on-screen controls work well on the iPhone. But we asked if keyboard support in games, a feature of iPadOS 15, would come to the game to help users control their character more easily on the bigger tablet.

“Control in Rift is something that we spent a lot of time on, so I don’t think we’ll implement keyboard support anytime soon,” Chow explains. “But gamepad support is something that could work, especially for the Apple TV, so who knows.”

With DAZN, SharePlay support is something that’s of interest to King and the rest of the team. ”We’re all about flexibility, and while you can already join with friends in watching a game, SharePlay does bring something to the table. If enough users give feedback to us that it’s a feature they want on their iPad or Apple TV, it’s something we’ll consider for a future update.”

Finally, with accessibility a big part of Apple’s focus in software interaction, we asked King if there’s upcoming features to help with those with certain impairments when using DAZN.

“We have some really interesting ideas for accessibility,“ King reveals. “We don’t have to give you one audio stream for instance, so there’s no reason for using sign language as an alternative commentary, but for the moment, subtitles and closed captions for pre-recorded content are something that we’re currently working on. But there’s far more options compared to cable content that we can provide to help cater to someone who is either visually or auditory impaired, and we want to help them.”

Our pick of the best iPhone deals in December 2021

TechRadar – All the latest technology news

Posts tagged "speak"

Microsoft’s VASA-1 AI video generation system can make lifelike avatars that speak volumes from a single photo

How VASA-1 works and what it's capable of

Works like magic – but is it a miracle spell or a recipe for disaster?

YOU MIGHT ALSO LIKE…

Google Bard can now speak loud and clear as update introduces speech feature

Productivity boost

Availability

We speak to four winners of Apple’s 2021 App Awards

Reflecting on their past

Where next for these apps?