Google’s Gemini AI has only been around for two months at the time of this writing, and already, the company is launching its next-generation model dubbed Gemini 1.5.
The announcement post gets into the nitty-gritty explaining all the AI’s improvements in detail. It’s all rather technical, but the main takeaway is that Gemini 1.5 will deliver “dramatically enhanced performance.” This was accomplished with the implementation of a “Mixture-of-Experts architecture” (or MoE for short) which sees multiple AI models working together in unison. Implementing this structure made Gemini easier to train as well as faster at learning complicated tasks than before.
There are plans to roll out the upgrade to all three major versions of the AI, but the only one being released today for early testing is Gemini 1.5 Pro.
What’s unique about it is the model has “a context window of up to 1 million tokens”. Tokens, as they relate to generative AI, are the smallest pieces of data LLMs (large language models) use “to process and generate text.” Bigger context windows allow the AI to handle more information at once. And a million tokens is huge, far exceeding what GPT-4 Turbo can do. OpenAI’s engine, for the sake of comparison, has a context window cap of 128,000 tokens.
Gemini Pro in action
With all these numbers being thrown, the question is what does Gemini 1.5 Pro look like in action? Google made several videos showcasing the AI’s abilities. Admittedly, it’s pretty interesting stuff as they reveal how the upgraded model can analyze and summarize large amounts of text according to a prompt.
In one example, they gave Gemini 1.5 Pro the over 400-page transcript of the Apollo 11 moon mission. It showed the AI could “understand, reason about, and identify” certain details in the document. The prompter asks the AI to locate “comedic moments” during the mission. After 30 seconds, Gemini 1.5 Pro managed to find a few jokes that the astronauts cracked while in space, including who told it and explained any references made.
These analysis skills can be used for other modalities. In another demo, the dev team gave the AI a 44-minute Buster Keaton movie. They uploaded a rough sketch of a gushing water tower and then asked for the timestamp of a scene involving a water tower. Sure enough, it found the exact part ten minutes into the film. Keep in mind this was done without any explanation about the drawing itself or any other text besides the question. Gemini 1.5 Pro understood it was a water tower without extra help.
The model is not available to the general public at the moment. Currently, it’s being offered as an early preview to “developers and enterprise customers” through Google’s AI Studio and Vertex AI platforms for free. The company is warning testers they may experience long latency times since it is still experimental. There are plans, however, to improve speeds down the line.
We reached out to Google asking for information on when people can expect the launch of Gemini 1.5 and Gemini 1.5 Ultra plus the wider release of these next-gen AI models. This story will be updated at a later time. Until then, check out TechRadar's roundup of the best AI content generators for 2024.