Google’s Gemini AI can now handle bigger prompts thanks to next-gen upgrade

Google’s Gemini AI has only been around for two months at the time of this writing, and already, the company is launching its next-generation model dubbed Gemini 1.5.

The announcement post gets into the nitty-gritty explaining all the AI’s improvements in detail. It’s all rather technical, but the main takeaway is that Gemini 1.5 will deliver “dramatically enhanced performance.” This was accomplished with the implementation of a “Mixture-of-Experts architecture” (or MoE for short) which sees multiple AI models working together in unison. Implementing this structure made Gemini easier to train as well as faster at learning complicated tasks than before.

There are plans to roll out the upgrade to all three major versions of the AI, but the only one being released today for early testing is Gemini 1.5 Pro. 

What’s unique about it is the model has “a context window of up to 1 million tokens”. Tokens, as they relate to generative AI, are the smallest pieces of data LLMs (large language models) use “to process and generate text.” Bigger context windows allow the AI to handle more information at once. And a million tokens is huge, far exceeding what GPT-4 Turbo can do. OpenAI’s engine, for the sake of comparison, has a context window cap of 128,000 tokens. 

Gemini Pro in action

With all these numbers being thrown, the question is what does Gemini 1.5 Pro look like in action? Google made several videos showcasing the AI’s abilities. Admittedly, it’s pretty interesting stuff as they reveal how the upgraded model can analyze and summarize large amounts of text according to a prompt. 

In one example, they gave Gemini 1.5 Pro the over 400-page transcript of the Apollo 11 moon mission. It showed the AI could “understand, reason about, and identify” certain details in the document. The prompter asks the AI to locate “comedic moments” during the mission. After 30 seconds, Gemini 1.5 Pro managed to find a few jokes that the astronauts cracked while in space, including who told it and explained any references made.

These analysis skills can be used for other modalities. In another demo, the dev team gave the AI a 44-minute Buster Keaton movie. They uploaded a rough sketch of a gushing water tower and then asked for the timestamp of a scene involving a water tower. Sure enough, it found the exact part ten minutes into the film. Keep in mind this was done without any explanation about the drawing itself or any other text besides the question. Gemini 1.5 Pro understood it was a water tower without extra help.

Experimental tech

The model is not available to the general public at the moment. Currently, it’s being offered as an early preview to “developers and enterprise customers” through Google’s AI Studio and Vertex AI platforms for free. The company is warning testers they may experience long latency times since it is still experimental. There are plans, however, to improve speeds down the line.

We reached out to Google asking for information on when people can expect the launch of Gemini 1.5 and Gemini 1.5 Ultra plus the wider release of these next-gen AI models. This story will be updated at a later time. Until then, check out TechRadar's roundup of the best AI content generators for 2024.

You might also like

TechRadar – All the latest technology news

Read More

Apple could be working on a new AI tool that animates your images based on text prompts

Apple may be working on a new artificial intelligence tool that will let you create basic animations from your photos using a simple text prompt. If the tool comes to fruition, you’ll be able to turn any static image into a brief animation just by typing in what you want it to look like. 

According to 9to5Mac, Apple researchers have published a paper that details procedures for manipulating image graphics using text commands. The tool, Apple Keyframer, will use natural language text to tell the proposed AI system to manipulate the given image and animate it. 

Say you have a photo of the view from your window, with trees in the background and even cars driving past. From what the paper suggests, you’ll be able to type commands such as ‘make the leaves move as if windy’ into the Keyframer tool, which will then animate the specified part of your photo.

You may recognize the name ‘keyframe’ if you’re an Apple user, as it’s already part of Apple’s Live Photos feature – which lets you go through a ‘live photo’ GIF and select which frame, the keyframe, you want to be the actual still image for the photo. 

Better late than never? 

Apple has been notably slow to jump onto the AI bandwagon, but that’s not exactly surprising. The company is known to play the long game and let others beat out the kinks before they make their move, as we’ve seen with its recent foray into mixed reality with the Apple Vision Pro (this is also why I have hope for a foldable iPhone coming soon). 

I’m quite excited for the Keyframer tool if it does come to fruition because it’ll put basic animation tools into the palm of every iPhone user who might not know where to even start with animation, let alone make their photos move.

Overall, the direction Apple seems to be taking in terms of AI tools seems to be a positive one. The Keyframer tool comes right off the back of Apple’s AI-powered image editing tool, which again reinforces the move towards user experience improvement rather than just putting out things that mirror the competition from companies like OpenAI, Microsoft, and Google.

I’m personally glad to see that Apple’s dive into the world of artificial intelligence tools isn’t just another AI chatbot like ChatGPT or Google Gemini, but rather focusing on tools that offer unique new features for iOS and macOS products. While this project is in the very early stages of inception, I’m still pretty hyped about the idea of making funny little clips of my cat being silly or creating moving memories of my friends with just a few word prompts. 

As for when we’ll get our hands on Keyframer, unfortunately there’s no release date in sight just yet – but based on previous feature launches, Apple willingly revealing details at this stage indicates that it’s probably not too far off, and more importantly isn’t likely to get tossed aside. After all, Apple isn’t Google.

You might also like…

TechRadar – All the latest technology news

Read More

Microsoft Copilot’s new AI tool will turn your simple prompts into songs

Thanks to a newfound partnership with music creation platform Suno, Microsoft Copilot can now generate short-form songs with a single text prompt.

The content it creates not only consists of instrumentals but also fleshed-out lyrics and actual singing voices. Microsoft states in the announcement that you don’t need to have any pre-existing music-making skills. All you need is an idea in your head. If any of this sounds familiar to you, that’s because both Meta and Google have their versions of this technology in the form of MusicGen and Instrument Playground, respectively. These two function similarly too, although they run on a proprietary AI model instead of something third-party.

How to use the Suno plugin

To use this feature, you’ll have to first launch Microsoft Edge, as the update is exclusive to the browser, then head on over to the Copilot website, sign in, and click the Plugin tab in the top right corner. Make sure that Suno is currently active. 

Suno plugin

(Image credit: Future)

Once everything is in place, enter a text prompt into Copilot and give it enough time to finish. It does take a little while for the AI to create something according to the prompt. In our experience, it took Copilot about ten minutes to make lyrics to a pop song about having an adventure with your family. Strangely, we didn’t receive any audio.

Copilot told us it made a link to Suno’s official website where we could listen to the track, but the URL disappeared the moment it was finished. We then prompted the AI to generate another song, however it only wrote the lyrics. When asked where the audio was, Copilot told us to imagine the melody in our heads or to sing the words out loud.

This is the first time we’ve had a music-generative AI flat-out refuse to produce audio.

Microsoft Copilot refusing to generate

(Image credit: Future)

Good performance… when it works

From here, we went to Suno’s website to get an idea of what the tech can do. The audio genuinely sounded great in our experience. The vocal performances were surprisingly good although not amazing. It’s not total gibberish like with Google’s Instrument Playground, but they’re not super clear either. 

We couldn't find out how good Copilot’s music-making skills are, but if it’s anything like the base Suno model, the content it can create will outshine anything that MusicGen or Instrument Playground can churn out.

Rollout of the Suno plugin has already begun and will continue over the coming weeks. No word if Microsoft has plans to expand the feature to other browsers although we did reach out to ask if this is in the works and if Microsoft is going to address the issues we encountered. We would’ve loved to hear the music. This story will be updated at a later time.

In the meantime, check out TechRadar's list of the best free music-making software in 2023.

You might also like

TechRadar – All the latest technology news

Read More

Forget ChatGPT – NExT-GPT can read and generate audio and video prompts, taking generative AI to the next level

2023 has felt like a year dedicated to artificial intelligence and its ever-expanding capabilities, but the era of pure text output is already losing steam. The AI scene might be dominated by giants like ChatGPT and Google Bard, but a new large language model (LLM), NExT-GPT, is here to shake things up – offering the full bounty of text, image, audio, and video output. 

NExT-GPT is the brainchild of researchers from the National University of Singapore and Tsinghua University. Pitched as an ‘any-to-any’ system, NExT-GPT can accept inputs in different formats and deliver responses according to the desired output in video, audio, image, and text responses. This means that you can put in a text prompt and NExT-GPT can process that prompt into a video, or you can give it an image and have that converted to an audio output. 

ChatGPT has only just announced the capability to ‘see, hear and speak’ which is similar to what NExT-GPT is offering – but ChatGPT is going for a more mobile-friendly version of this kind of feature, and is yet to introduce video capabilities. 

We’ve seen a lot of ChatGPT alternatives and rivals pop up over the past year, but NExT-GPT is one of the few LLMs we’ve seen so far that can match the text-based output of ChatGPT but also provide outputs beyond what OpenAI’s popular chatbot can currently do. You can head over to the GitHub page or the demo page to try it out for yourself. 

So, what is it like?

I’ve fiddled around with NExT-GPT on the demo site and I have to say I’m impressed, but not blown away. Of course, this is not a polished product that has the advantages of public feedback, multiple updates, and so on – but it is still very good. 

I asked it to turn a photo of my cat Miso into an image of him as a librarian, and I was pretty happy with the result. It may not be at the same level of quality as established image generators like Midjourney or Stable Diffusion, but it was still an undeniably very cute picture.

Cat in a library wearing glasses

This is probably one of the least cursed images I’ve personally generated using AI. (Image credit: Future VIA NExT-GPT)

I also tested out the video and audio features, but that didn't go quite as well as the image generation. The videos that were generated were again not awful, but did have the very obvious ‘made by AI’ look that comes with a lot of generated images and videos, with everything looking a little distorted and wonky. It was uncanny. 

Overall, there’s a lot of potential for this LLM to fill the audio and video gaps within big AI names like OpenAI and Google. I do hope that as NExT-GPT gets better and better, we’ll be able to see a higher quality of outputs and make some excellent home movies out of our cats seamlessly in no time. 

You might also like…

TechRadar – All the latest technology news

Read More