Gemini’s next evolution could let you use the AI while you browse the internet

Gemini may receive a big update on mobile in the near future where it’ll gain several new features including a text box overlay. Details of the upgrade come from industry insider AssembleDebug who shared his findings with a couple of publications.

PiunikaWeb gained insight into the overlay and it’s quite fascinating seeing it in action. It converts the AI’s input box into a small floating window located at the bottom of a smartphone display, staying there even if you close the app. You could, for example, talk to Gemini while browsing the internet or checking your email. 

AssembleDebug was able to activate the window and get it working on his phone while on X (the platform formerly known as Twitter). His demo video shows it behaving exactly like the Gemini app. You ask the AI a question, and after a few seconds, a response comes out complete with source links, images, as well as YouTube videos if the inquiry calls for it. Answers have the potential to obscure the app behind it. 

AssembleDebug’s video reveals the length depends on whether the question requires a long-form answer. We should mention that the overlay is multimodal so you can write out an inquiry, verbally command the AI, or upload an image.

Smarter AI

The other notable changes were shared with Android Authority. First, Gemini on Android will gain the ability to accept different types of files besides photographs. Images show a tester uploading a PDF, and then asking the AI to summarize the text inside it. Apparently, the feature is present in the current version of Gemini however activating it doesn’t do anything. Android Authority speculates the update may be exclusive to either Google Workspace or Gemini Advanced; maybe both. It’s hard to tell at the moment.

Second is a pretty basic tool, but useful nonetheless called Select Text. The way Gemini works right now is you’re forced to copy a whole block of text even if you just want a small portion. Select Text solves this issue by allowing you to grab a specific line or paragraph. 

Yeah, it’s not a flashy upgrade. Almost every app in the world has the same capability. Yet, the tool has “huge implications for Gemini’s usability”. It greatly improves the AI’s ease of use by not being so restrictive.  

See more

A fourth, smaller update was found by AssembleDebug. It’s known as Real-time Responses. The descriptor text found alongside it claims the tool lets you see answers being written out in real-time. However, as PiunikaWeb points out, it’s only an animation change. There aren’t any “practical benefits.” Instead of waiting for Gemini to generate a response as one solid mass, you can choose to see the AI write everything out line by line similar to its desktop counterpart.

Google I/O 2024 kicks off in about three weeks on May 14. No word on when these features will roll out, but we'll learn a lot more during the event.

While you wait, check out TechRadar's roundup of the best Android smartphones for 2024 if you're looking to upgrade.

You might also like

TechRadar – All the latest technology news

Read More

Google Gemini’s new Calendar capabilities take it one step closer to being your ultimate personal assistant

Google’s new family of artificial intelligence (AI) generative models, Gemini, will soon be able to access events scheduled in Google Calendar on Android phones.

According to 9to5Google, Calendar events were on Gemini Experiences Senior Director of Product Management at Google Jack Krawczyk’s “things to fix ASAP” list for what Google would be working to add to Gemini to make it a better-equipped digital assistant. 

Users who have the Gemini app on an Android device can now expect Gemini to respond to voice or text prompts like “Show me my calendar” and “Do I have any upcoming calendar events?” When 9to5Google tried this the week before, Gemini responded that it couldn’t fulfill those types of requests and queries – which was particularly noticeable as those kinds of requests are pretty commonplace with rival (non-AI) digital assistants such as Siri or Google Assistant. However, when those same prompts were attempted this week, Gemini opened the Google Calendar app and fulfilled the requests. It seems that if users would like to enter a new event using Gemini, you need to tell it something like “Add an event to my calendar,” to which it should then prompt the user to fill out the details manually by using voice commands. 

Google Calendar

(Image credit: Shutterstock)

Going all in on Gemini

Google is clearly making progress to set up Gemini as its proprietary all-in-one AI offering (including as a digital assistant, replacing Google Assistant in the future). It’s got quite a few steps before it manages that, with users asking for features like the ability to play music or edit their shopping lists via Gemini. Another significant hurdle for Gemini to clear if it wants to become popular is that it’s only available in the United States for now. 

The race to become the best AI assistant has gotten a little bit more intense recently between Microsoft with Copilot, Google with Gemini, and Amazon with Alexa. Google did recently make some pretty big strides in its ability to compress the larger Gemini models so it could run on mobile devices. The capabilities of these more complex models sound like they can give Gemini’s capabilities a major boost. Google Assistant is pretty widely recognized and this is another feather in Google’s cap. I feel hesitant about placing a bet on any single one of these digital AI assistants, but if Google continues at this pace with Gemini, I think its chances are pretty good.

You might also like

TechRadar – All the latest technology news

Read More

Google isn’t done trying to demonstrate Gemini’s genius and is working on integrating it directly into Android devices

Google’s newly reworked and rebranded family of generative artificial intelligence models, Gemini, may still be very much at the beginning of its development journey, but Google is making big plans for it. It’s planning to integrate Gemini into Android software for phones, and it’s predicted that users will be able to access it offline in 2025, according to a top executive at Google’s Pixel division, Brian Rakowski.

Gemini is a series of large language models that are designed to understand and generate human-like text and more, and the most compact, efficient model of these is Gemini Nano, intended for tasks on devices. This is the model that’s currently built and adapted to run on Pixel phones and other capable Android devices. According to Rakowski, Gemini Nano’s larger sibling models that require an internet connection to run (as they only live in Google’s data centers) are the ones expected to be integrated into new Android phones starting next year. 

Google has been able to do this thanks to recent breakthroughs in engineers’ ability to compress these bigger and more complex models to a size that was feasible for use on smaller devices. One of these larger sibling models is Gemini Ultra, which is considered a key competitor to Open AI’s premium GPT-4 chatbot, and the compressed version of it will be able to run on an Android phone with no extra assistance.

This would mean users could access the processing power that Google is offering with Gemini whether they’re connected to the internet or not, potentially improving their day-to-day experience with it. It also means whatever you enter into Gemini wouldn’t necessarily have to leave your phone for Gemini to process it (if Google wills it, that is), thereby making it easier to keep your entries and information private – cloud-based AI tools have been criticized in the past for having inferior digital security compared to locally-run models. Rakowski told CNBC that what users will experience on their devices will be “instantaneous without requiring a connection or subscription.”

Three Android phones on an orange background showing the Google Gemini Android app

(Image credit: Future)

A potential play to win users' favor 

MSPowerUser points out that the smartphone market has cooled down as of late, and some manufacturers might be trying to capture potential buyers’ attention by offering devices capable of utilizing what modern AI has to offer. While AI is an incredibly rich and intriguing area of research and novelty, it might not be enough to convince people to swap their old phone (which may already be capable of processing something like Gemini or ChatGPT) for a new one. Right now, the makers of AI hoping to raise trillions of dollars in funding are likely to offer versions that can run on existing devices so people can try it for themselves, and my guess is that satisfies most people’s AI appetites right now. 

Google, Microsoft, Amazon, and others are all trying to develop their own AI models and assistants to become the first to reap the rewards. Right now, it seems like AI models are extremely impressive and can be surprising, and they can help you at work (although caution should be heavily exercised if you do this), but their initial novelty is currently the biggest draw they have.

These tools will have to demonstrate continuous quality-of-life improvements to be significant enough to make the type of impression they’re aiming to make. I do believe steps like making their models widely available on users’ devices and giving users the option and the capability to use them offline is a step that could pay off for Google in the long run – and I would like to see other tech giants follow in its path. 

YOU MIGHT ALSO LIKE…

TechRadar – All the latest technology news

Read More

Google explains how Gemini’s AI image generation went wrong, and how it’ll fix it

A few weeks ago Google launched a new image generation tool for Gemini (the suite of AI tools formerly known as Bard and Duet) which allowed users to generate all sorts of images from simple text prompts. Unfortunately, Google’s AI tool repeatedly missed the mark and generated inaccurate and even offensive images that led a lot of us to wonder – how did the bot get things so wrong? Well, the company has finally released a statement explaining what went wrong, and how it plans to fix Gemini. 

The official blog post addressing the issue states that when designing the text-to-image feature for Gemini, the team behind Gemini wanted to “ensure it doesn’t fall into some of the traps we’ve seen in the past with image generation technology — such as creating violent or sexually explicit images, or depictions of real people.” The post further explains that users probably don’t want to keep seeing people of just one ethnicity or other prominent characteristic. 

So, to offer a pretty basic explanation for what’s been going on: Gemini has been throwing up images of people of color when prompted to generate images of white historical figures, giving users ‘diverse Nazis’, or simply ignoring the part of your prompt where you’ve specified exactly what you’re looking for. While Gemini’s image capabilities are currently on hold, when you could access the feature you’d specify exactly who you’re trying to generate – Google uses the example “a white veterinarian with a dog” – and Gemini would seemingly ignore the first half of that prompt and generate veterinarians of all races except the one you asked for. 

Google went on to explain that this was the outcome of two crucial failings – firstly, Gemini was showing a range of different people without considering a range not to show. Alongside that, in trying to make a more conscious, less biased generative AI, Google admits the “model became way more cautious than we intended and refused to answer certain prompts entirely – wrongly interpreting some very anodyne prompts as sensitive.”

So, what's next?

At the time of writing, the ability to generate images of people on Gemini has been paused while the Gemini team works to fix the inaccuracies and carry out further testing. The blog post notes that AI ‘hallucinations’ are nothing new when it comes to complex deep learning models – even Bard and ChatGPT had some questionable tantrums as the creators of those bots worked out the kinks. 

The post ends with a promise from Google to keep working on Gemini’s AI-powered people generation until everything is sorted, with the note that while the team can’t promise it won’t ever generate “embarrassing, inaccurate or offensive results”, action is being taken to make sure it happens as little as possible. 

All in all, this whole episode puts into perspective that AI is only as smart as we make it. Our editor-in-chief Lance Ulanoff succinctly noted that “When an AI doesn't know history, you can't blame the AI.” With how quickly artificial intelligence has swooped in and crammed itself into various facets of our daily lives – whether we want it or not – it’s easy to forget that the public proliferation of AI started just 18 months ago. As impressive as the tools currently available to us are, we’re ultimately still in the early days of artificial intelligence. 

We can’t rain on Google Gemini’s parade just because the mistakes were more visually striking than say, ChatGPT’s recent gibberish-filled meltdown. Google’s temporary pause and reworking will ultimately lead to a better product, and sooner or later we’ll see the tool as it was meant to be. 

You might also like…

TechRadar – All the latest technology news

Read More