OpenAI’s GPT-4o ChatGPT assistant is more life-like than ever, complete with witty quips

So no, OpenAI didn’t roll out a search engine competitor to take on Google at its May 13, 2024 Spring Update event. Instead, OpenAI unveiled GPT-4 Omni (or GPT-4o for short) with human-like conversational capabilities, and it's seriously impressive. 

Beyond making this version of ChatGPT faster and free to more folks, GPT-4o expands how you can interact with it, including having natural conversations via the mobile or desktop app. Considering it's arriving on iPhone, Android, and desktop apps, it might pave the way to be the assistant we've all always wanted (or feared). 

OpenAI's ChatGPT-4o is more emotional and human-like

OpenAI demoing GPT-4o on an iPhone during the Spring Update event.

OpenAI demoing GPT-4o on an iPhone during the Spring Update event. (Image credit: OpenAI)

GPT-4o has taken a significant step towards understanding human communication in that you can converse in something approaching a natural manner. It comes complete with all the messiness of real-world tendencies like interrupting, understanding tone, and even realizing it's made a mistake.

During the first live demo, the presenter asked for feedback on his breathing technique. He breathed heavily into his phone, and ChatGPT responded with the witty quip, “You’re not a vacuum cleaner.” It advised on a slower technique, demonstrating its ability to understand and respond to human nuances.

So yes, ChatGPT has a sense of humor but also changes the tone of responses, complete with different inflections while conveying a “thought”. Like human conversations, you can cut the assistant off and correct it, making it react or stop speaking. You can even ask it to speak in a certain tone, style, or robotic voice. Furthermore, it can even provide translations.

In a live demonstration suggested by a user on X (formerly Twitter), two presenters on stage, one speaking English and one speaking Italian, had a conversation with Chat GPT-4o handling translation. It could quickly deliver the translation from Italian to English and then seamlessly translate the English response back to Italian.

It’s not just voice understanding with GPT-4o, though; it can also understand visuals like a written-out linear equation and then guide you through how to solve it, as well as look at a live selfie and provide a description. That could be what you're wearing or your emotions. 

In this demo, GPT said the presenter looked happy and cheerful. It’s not without quirks, though. At one point ChatGPT said it saw the image of the equation before it was even written out, referring back to a previous visual of just a wooden tabletop.

Throughout the demo, ChatGPT worked quickly and didn't really struggle to understand the problem or ask about it. GPT-4o is also more natural than typing in a query, as you can speak naturally to your phone and get a desired response – not one that tells you to Google it.  

A little like “Samantha” in “Her”

If you’re thinking about Her or another futuristic-dystopian film with an AI, you’re not the only one. Speaking with ChatGPT in such a natural way is essentially the Her moment for OpenAI. Considering it will be rolling out to the mobile app and as a desktop app for free, many people may soon have their own Her moments.

The impressive demos across speech and visuals feel may only be scratching the surface of what's possible. Overall performance and how well GPT-4o performs day-to-day in various environments remains to be seen, and once available, TechRadar will be putting it through the test. Still, after this peek, it's clear that GPT-4o is preparing to take on the best Google and Apple have to offer in their eagerly-anticipated AI reveals.

The outlook on GPT-4o

However, announcing this the day before Google I/O kicks off and just a few weeks after we’ve seen new AI gadgets hit the scene – like the Rabbit R1 – OpenAI is giving us a taste of truly useful AI experiences we want. If this rumored partnership with Apple comes to fruition, Siri could be supercharged, and Google will almost certainly show off its latest AI tricks at I/O on May 14, 2024. But will they be enough?

We wish OpenAI showed off a bit more live demos with the latest ChatGPT-4o in what turned out to be a jam-packed, less-than-30-minute keynote. Luckily, it will be rolling out to users in the coming week, and you won’t have to pay to try it out.

You Might Also Like

TechRadar – All the latest technology news

Read More

Microsoft’s VASA-1 AI video generation system can make lifelike avatars that speak volumes from a single photo

AI-generated video is already a reality, and now another player has joined the fray: Microsoft. Apparently, the tech giant has developed a generative AI system that can whip up realistic talking avatars from a single picture and an audio clip. The tool is named VASA-1, and it goes beyond mimicking mouth movement; it can capture lifelike emotions and produce natural-looking movements as well.

The system offers its user the ability to modify the subject’s eye movements, the distance the subject is being perceived at, and the emotions expressed. VASA-1 is the first model in what is rumored to be a series of AI tools, and MSPowerUser reports that it can conjure up specific facial expressions, synchronize lip movements to a high degree, and produce human-like head motions. 

It can offer a wide range of emotions to choose from and generate facial subtleties, which sounds like it could make for a scarily convincing result. 

How VASA-1 works and what it's capable of

Seemingly taking a note from how human 3D animators and modelers work, VASA-1 makes use of a process it calls ‘disentanglement,’ allowing the system to control and edit the facial expressions, 3D head position, and facial features independently of each other, and this is what powers VASA-1’s realism.

As you might be imagining already, this has seismic potential, offering the possibility to totally change our experiences of digital apps and interfaces. According to MSPowerUser, VASA-1 can produce videos unlike those that it was trained on. Apparently, the system wasn’t trained on artistic photos, singing voices, or non-English speech, but if you request a video that features one of these, it’ll oblige. 

The Microsoft researchers behind VASA-1 praise its real-time efficiency, stating that the system can make fairly high-resolution videos (512×512 pixels) with high frame rates. Frame rate, or frames per second (fps), is the frequency at which a series of images (referred to as frames) can be captured or displayed in succession within a piece of media. The researchers claim that VASA-1 can generate videos with 45fps in offline mode, and 40fps with online generation. 

You can check out the state of VASA-1 and learn more about it on Microsoft’s dedicated webpage for the project. It has several demonstrations and includes links to download information about it, ending with a section headlined ‘Risks and responsible AI considerations.’

Works like magic – but is it a miracle spell or a recipe for disaster?

In this final reflective section, Microsoft acknowledges that a tool like this has plentiful scope for misuse, but the researchers try to emphasize the potential positives of VASA-1. They’re not wrong; a technology like this could mean next-level educational experiences that are available to more students than ever before, better assistance to people who have difficulties communicating, the capability to provide companionship, and improved digital therapeutic support. 

All of that said, it would be foolish to ignore the potential for harm and wrongdoing with something like this. Microsoft does state that it doesn’t currently have plans to make VASA-1 available in any form to the public until it’s reassured that “the technology will be used responsibly and in accordance with proper regulations.” If Microsoft sticks to this ethos, I think it could be a long wait. 

All in all, I think it’s becoming hard to deny that generative AI video tools are going to become more commonplace and the countdown to when they saturate our lives has begun. Google has been working on an analogous AI system with the moniker VLOGGER, and also recently put out a paper detailing how VLOGGER can create realistic videos of people moving, speaking, and gesturing with the input of a single photo. 

OpenAI also made headlines recently by introducing its own AI video generation tool, Sora, which can generate videos from text descriptions. OpenAI explained how Sora works on a dedicated page, and provided demonstrations that impressed a lot of people – and worried even more. 

I am wary of what these innovations will enable us to do, and I’m glad that, as far as we know, all three of these new tools are being kept tightly under wraps. I think realistically the best guardrails we have against the misuse of technologies like these are airtight regulations, but I’m doubtful that all governments will take these steps in time. 

YOU MIGHT ALSO LIKE…

TechRadar – All the latest technology news

Read More