AI-generated movies will be here sooner than you think – and this new Google DeepMind tool proves it

AI video generators like OpenAI's Sora, Luma AI's Dream Machine, and Runway Gen-3 Alpha have been stealing the headlines lately, but a new Google DeepMind tool could fix the one weakness they all share – a lack of accompanying audio.

A new Google DeepMind post has revealed a new video-to-audio (or 'V2A') tool that uses a combination of pixels and text prompts to automatically generate soundtracks and soundscapes for AI-generated videos. In short, it's another big step toward the creation of fully-automated movie scenes.

As you can see in the videos below, this V2A tech can combine with AI video generators (including Google's Veo) to create an atmospheric score, timely sound effects, or even dialogue that Google DeepMind says “matches the characters and tone of a video”.

Creators aren't just stuck with one audio option either – DeepMind's new V2A tool can apparently generate an “unlimited number of soundtracks for any video input” for any scene, which means you can nudge it towards your desired outcome with a few simple text prompts.

Google says its tool stands out from rival tech thanks to its ability to generate audio purely based on pixels – giving it a guiding text prompt is apparently purely optional. But DeepMind is also very aware of the major potential for misuses and deepfakes, which is why this V2A tool is being ringfenced as a research project – for now.

DeepMind says that “before we consider opening access to it to the wider public, our V2A technology will undergo rigorous safety assessments and testing”. It will certainly need to be rigorous, because the ten short video examples show that the tech has explosive potential, for both good and bad.

The potential for amateur filmmaking and animation is huge, as shown by the 'horror' clip below and one for a cartoon baby dinosaur. A Blade Runner-esque scene (below) showing cars skidding through a city with an electronic music soundtrack also shows how it could drastically reduce budgets for sci-fi movies. 

Concerned creators will at least take some comfort from the obvious dialogue limitations shown in the 'Claymation family' video. But if the last year has taught us anything, it's that DeepMind's V2A tech will only improve drastically from here.

Where we're going, we won't need voice actors

The combination of AI-generated videos with AI-created soundtracks and sound effects is a game-changer on many levels – and adds another dimension to an arms race that was already white hot.

OpenAI has already said that it has plans to add audio to its Sora video generator, which is due to launch later this year. But DeepMind's new V2A tool shows that the tech is already at an advanced stage and can create audio based purely on videos alone, rather than needing endless prompting.

DeepMind's tool works using a diffusion model that combines information taken from the video's pixels and the user's text prompts then spits out compressed audio that's then decoded into an audio waveform. It was apparently trained on a combination of video, audio, and AI-generated annotations.

Exactly what content this V2A tool was trained on isn't clear, but Google clearly has a potentially huge advantage in owning the world's biggest video-sharing platform, YouTube. Neither YouTube nor its terms of service are completely clear on how its videos might be used to train AI, but YouTube's CEO Neal Mohan recently told Bloomberg that some creators have contracts that allow their content to be used for training AI models.

Clearly, the tech still has some limitations with dialogue and it's still a long way from producing a Hollywood-ready finished article. But it's already a potentially powerful tool for storyboarding and amateur filmmakers, and hot competition with the likes of OpenAI means it's only going to improve rapidly from here.

You might also like…

TechRadar – All the latest technology news

Read More

DeepMind and Meta staff plan to launch a new AI chatbot that could have the edge over ChatGPT and Bard

Since the explosion in popularity of large language AI models chatbots like ChatGPT, Google Gemini, and Microsoft Copilot, many smaller companies have tried to wiggle their way into the scene. Reka, a new AI startup, is gearing up to take on artificial intelligence chatbot giants like Gemini (formerly known as Google Bard) and OpenAI’s ChatGPT – and it may have a fighting chance to actually do so. 

The company is spearheaded by Singaporean scientist Yi Tay, working towards Reka Flash, a multilingual language model that has been trained in over 32 languages. Reka Flash also boasts 21 billion parameters, with the company stating that the model could have a competitive edge with Google Gemini Pro and OpenAI’s ChatGPT 3.5 across multiple AI benchmarks. 

According to TechInAsia, the company has also released a more compact version of the model called Reka Edge, which offers 7 billion parameters with specific use cases like on-device use. It’s worth noting that ChatGPT and Google Gemini have significantly more training parameters (approximately 175 billion and 137 billion respectively), but those bots have been around for longer and there are benefits to more ‘compact’ AI models; for example, Google has ‘Gemini Nano’, an AI model designed for running on edge devices like smartphones that uses just 1.8 billion parameters – so Reka Edge has it beat there.

So, who’s Yasa?

The model is available to the public in beta on the official Reka site. I’ve had a go at using it and can confirm that it's got a familiar ChatGPT-esque feel to the user interface and the way the bot responds. 

The bot introduced itself as Yasa, developed by Reka, and gave me an instant rundown of all the things it could do for me. It had the usual AI tasks down, like general knowledge, sharing jokes or stories, and solving problems.

Interestingly, Yasa noted that it can also assist in translation, and listed 28 languages it can swap between. While my understanding of written Hindi is rudimentary, I did ask Yasa to translate some words and phrases from English to Hindi and from Hindi to English. 

I was incredibly impressed not just by the accuracy of the translation, but also by the fact that Yasa broke down its translation to explain not just how it got there, but also breaking down each word in the phrase or sentence and translated it word forward before giving you the complete sentence. The response time for each prompt no matter how long was also very quick. Considering that non-English-language prompts have proven limited in the past with other popular AI chatbots, it’s a solid showing – although it’s not the only multilingual bot out there.

Image 1 of 2

Reka translating

(Image credit: Future)
Image 2 of 2

Reka AI Barbie

(Image credit: Future)

I tried to figure out how up-to-date the bot was with current events or general knowledge and finally figured out the information.  It must have been trained on information that predates the release of the Barbie movie. I know, a weird litmus test, but when I asked it to give me some facts about the pink-tinted Margot Robbie feature it spoke about it as an ‘upcoming movie’ and gave me the release date of July 28, 2023. So, we appear to have the same case as seen with ChatGPT, where its knowledge was previously limited to world events before 2022

Of all the ChatGPT alternatives I’ve tried since the AI boom, Reka (or should I say, Yasa) is probably the most immediately impressive. While other AI betas feel clunky and sometimes like poor-man’s knockoffs, Reka holds its own not just with its visually pleasing user interfaces and easy-to-use setup, but for its multilingual capabilities and helpful, less robotic personality.

You might also like…

TechRadar – All the latest technology news

Read More