Ever since OpenAI showed off ChatGPT's new Voice Mode – and incurred the wrath of Scarlett Johansson – fans of the AI assistant have been desperate to take the feature for a spin. Well, they'll have to wait a little longer, as OpenAI says its advanced Voice Mode has now been delayed.
In a post on X (formerly Twitter), OpenAI said that Alpha testing for its startlingly lifelike Voice Mode has been pushed back a month so it can focus on “improving the user experience” and also its “ability to detect and refuse certain content.” In other words, it's still not quite ready for the questionable requests the real world might throw at it.
So when exactly will the Voice Mode be pushed out beyond this initial “small group of users”? OpenAI says, “We are planning for all Plus users to have access in the fall.” But there's a slightly worrying caveat that “exact timelines depend on meeting our high safety and reliability bar.” Further delays, then, could well be on the cards.
We’re sharing an update on the advanced Voice Mode we demoed during our Spring Update, which we remain very excited about:We had planned to start rolling this out in alpha to a small group of ChatGPT Plus users in late June, but need one more month to reach our bar to launch.…June 25, 2024
See more
This is all a far cry from OpenAI's previous rollout plans. When it demoed the new Voice Mode in May at its Spring event, it said the feature would “be rolling out in the coming weeks.” That's still technically true, but the reality is that it'll now be more like months.
Delays for new tech features aren't exactly new, but ChatGPT subscribers aren't happy. “Biggest rug pull in history,” concluded one X commenter, with others stating that the huge demo “was probably misleading to many” and that they'd also cancel their Plus account until it actually rolls out.
A casualty of the AI arms race
The outpouring of frustration from ChatGPT fans about this Voice Mode delay might seem disproportional, but it's also understandable. An unfortunate side effect of the AI arms race is the staging of whizz-bang demos with optimistic roll-outs, followed by delays and vague promises of launches in the 'coming weeks' or, even worse, 'coming months.'
OpenAI's explanation of the delays to ChatGPT's most exciting new feature are certainly reasonable on the surface. As it explains in its statement, the new Voice Mode takes us closer to real-time, natural conversations with AI chatbots – and that is a potentially dangerous tool if it goes off the rails in the wild.
Then again, the timing of OpenAI's Spring update event – on May 13, just a day before Google IO 2024 – did seem conveniently designed to steal some thunder from Google's AI announcements. So, the theories that ChatGPT's new voice was demoed a little prematurely do have some credence.
Still, with OpenAI releasing several demo videos (like the one above) on its YouTube channel of the new Voice Mode (with the controversial 'Sky' voice, following Scarlett Johansson's complaints that it sounded a little too much like her AI character in Her), suggest it's far from marketing vaporware.
Just a week on from arrival of Luma AI's Dream Machine, another big OpenAI Sora has just landed – and Runway's latest AI video generator might be the most impressive one yet.
Runway was one of the original text-to-video pioneers, launching its Gen-2 model back in March 2023. But its new Gen-3 Alpha model, which will apparently be “available for everyone over the coming days”, takes things up several notches with new photo-realistic powers and promises of real-world physics.
The demo videos (which you can see below) showcase how versatile Runway's new AI model is, with the clips including realistic human faces, drone shots, simulations of handheld cameras and atmospheric dreamscapes. Runway says that all of them were generated with Gen-3 Alpha “with no modifications”.
Apparently, Gen-3 Alpha is also “the first of an upcoming series of models” that have been trained “on a new infrastructure built for large-scale multimodal training”. Interestingly, Runway added that the new AI tool “represents a significant step towards our goal of building General World Models”, which could create possibilities for gaming and more.
A 'General World Model' is one that effectively simulates an environment, including its physics – which is why one of the sample videos shows the reflections on a woman's face as she looks through a train window.
These tools won't just be for us to level-up our GIF games either – Runway says it's “been collaborating and partnering with leading entertainment and media organizations to create custom versions of Gen-3 Alpha”, which means tailored versions of the model for specific looks and styles. So expect to see this tech powering adverts, shorts and more very soon.
When can you try it?
Last week, Luma AI's Dream Machine arrived to give us a free AI video generator to dabble with, but Runway's Gen-3 Alpha model is more targeted towards the other end of the AI video scale.
It's been developed in collaboration with pro video creators with that audience in mind, although Runway says it'll be “available for everyone over the coming days”. You can create a free account to try Runway's AI tools, though you'll need to pay a monthly subscription (starting from $ 12 per month, or around £10 / AU$ 18 a month) to get more credits.
You can create videos using text prompts – the clip above, for example, was made using the prompt “a middle-aged sad bald man becomes happy as a wig of curly hair and sunglasses fall suddenly on his head”. Alternatively, you can use still images or videos as a starting point.
The realism on show is simultaneously impressive and slightly terrifying, but Runway states that the model will be released with a new set of safeguards against misuse, including an “in-house visual moderation system” and C2PA (Coalition for Content Provenance and Authenticity) provenance standards. Let the AI video battles commence.
The text-to-video AI boom has really kicked off in the past few months, with the only downside being that the likes of OpenAI Sora still aren't available for us to try. If you're tired of waiting, a new rival called Dream Machine just landed – and you can take it for a spin right now.
Dream Machine is made by Luma AI, which has previously released an app that helps you shoot 3D photos with your iPhone. Well, now it's turned its attention to generative video, which has a free tier that you can use right now with a Google account – albeit with some caveats.
The main one is that Dream Machine seems to be slightly overwhelmed at the time of writing. There's currently a banner on the site stating that “generations take 120 seconds” and that “due to high demand, requests will be queued”. Our text prompt took over 20 minutes to be processed, but the results (below) are pretty impressive.
Dream Machine's outputs are more limited in length and resolution compared to the likes of OpenAI's Sora and Kling AI, but it's a good taster of how these services will work. The clips it produces are five seconds long and in 1360×752 resolution. You just type a prompt into its search bar and wait for it to appear in your account, after which you can download a watermarked version.
While there was a lengthy wait for the results (which should hopefully improve once initial demand has dropped), our prompt of 'a close-up of a dog in sunglasses driving a car through Las Vegas at night' produced a clip that was very close to what we envisaged.
Dream Machine's free plan is capped at 30 generations a month, but if you need more there are Standard (120 generations, $ 29.99 a month, about £24, AU$ 45), Pro (400 generations, $ 99.99 a month, about £80, AU$ 150) and Premier (2,000 generations, $ 499.99 a month, about £390, AU$ 750).
A taste of AI videos to come
Like most generative AI video tools, questions remain about exactly what data Luma AI's was trained on – which means that its potential outside of personal use or improving your GIF game could be limited. It also isn't the first free text-to-video tool we've seen, with Runway's Gen 2 model coming out of beta last year.
The Dream Machine website also states that the tool does have technical limitations when it comes to handling text and motion, so there's plenty of trial-and-error involved. But as a taster of the more advanced (and no doubt more expensive) AI video generators to come, it's certainly a fun tool to test drive.
That's particularly the case, given that other alternatives like Google Veo currently have lengthy waitlists. Meanwhile, more powerful models like OpenAI's Sora (which can generate videos that are 60-seconds long) won't be available until later this year, while Kling AI is currently China-only.
This will certainly change as text-to-video generation becomes mainstream, but until then, Dream Machine is a good place to practice (if you don't mind waiting a while for the results).
It feels like we're at a tipping point for AI video generators, and just a few months on from OpenAI's Sora taking social media by storm with its text-to-video skills, a new Chinese rival is taking social media by storm.
Called Kling AI, the new “video generation model” is made by the Chinese TikTok rival Kuaishou, and it's currently only available as a public demo in China via a waitlist. But that hasn't stopped it from quickly going viral, with some impressive clips that suggest it's at least as capable as Sora.
You can see some of the early demo videos (like the one below) on the Kling AI website, while a number of threads on X (formerly Twitter) from the likes of Min Choi (below) have rounded up what are claimed to be some impressive early creations made by the tool (with some help from editing apps).
As always, some caution needs to be applied with these early AI-generated clips, as they're cherry-picked examples, and we don't yet know anything about the hardware or other software that's been used to create them.
This is wild.Chinese AI KLING is breaking the Internet while OpenAI Sora is sleeping.People with access are already generating AI videos and short films.The videos look insane.1. “Zootopia Grand Prix”pic.twitter.com/pmCZctsMtTJune 9, 2024
See more
Still, those caveats aside, Kling AI certainly looks like another powerful AI video generator. It lets early testers create 1080/30p videos that are up to two minutes in length. The results, while still carrying some AI giveaways like smoothing and minor artifacts, are impressively varied, with a promising amount of coherence.
Exactly how long it'll be before Kling AI is opened up to users outside China remains to be seen. But with OpenAI suggesting that Sora will get a public release “later this year”, Kling AI best not wait too long if it wants to become the TikTok of AI-generated video.
The AI video war heats up
Now that AI photo tools like Midjourney and Adobe Firefly are hitting the mainstream, it's clear that video generators are the next big AI battleground – and that has big implications for social media, the movie industry, and our ability to trust what we see during, say, major election campaigns.
None of them are yet perfect, and it isn't clear how long it takes to produce a clip using the likes of Sora or Kling AI, nor what kind of computing power is needed. But the leaps being made towards photorealism and simulating real-world physics have been massive in the past year, so it clearly won't be long before these tools hit the mainstream.
That battle will become an international one, too – with the US still threatening a TikTok ban, expect there to be a few more twists and turns before the likes of Kling AI roll out worldwide.
OpenAI, the tech company behind ChatGPT, has announced that it’s formed a ‘Safety and Security Committee’ that’s intended to make the firm’s approach to AI more responsible and consistent in terms of security.
It’s no secret that OpenAI and CEO Sam Altman – who will be on the committee – want to be the first to reach AGI (Artificial General Intelligence), which is broadly considered as achieving artificial intelligence that will resemble human-like intelligence and can teach itself. Having recently debuted GPT-4o to the public, OpenAI is already training the next-generation GPT model, which it expects to be one step closer to AGI.
GPT-4o was debuted on May 13 to the public as a next-level multimodal (capable of processing in multiple ‘modes’) generative AI model, able to deal with input and respond with audio, text, and images. It was met with a generally positive reception, but more discussion around the innovation has since arisen regarding its actual capabilities, implications, and the ethics around technologies like it.
Just over a week ago, OpenAI confirmed to Wired that its previous team responsible for overseeing the safety of its AI models had been disbanded and reabsorbed into other existing teams. This followed the notable departures of key company figures like OpenAI co-founder and chief scientist Ilya Sutskever, and co-lead of the AI safety ‘superalignment’ team Jan Leike. Their departure was reportedly related to their concerns that OpenAI, and Altman in particular, was not doing enough to develop its technologies responsibly, and was forgoing conducting due diligence.
This has seemingly given OpenAI a lot to reflect on and it’s formed the oversight committee in response. In the announcement post about the committee being formed, OpenAI also states that it welcomes a ‘robust debate at this important moment.’ The first job of the committee will be to “evaluate and further develop OpenAI’s processes and safeguards” over the next 90 days, and then share recommendations with the company’s board.
What happens after the 90 days?
The recommendations that are subsequently agreed upon to be adopted will be shared publicly “in a manner that is consistent with safety and security.”
The committee will be made up of Chairman Bret Taylor, CEO of Quora Adam D’Angelo, and Nicole Seligman, a former executive of Sony Entertainment, alongside six OpenAI employees which includes Sam Altman as mentioned, and John Schulman, a researcher and cofounder of OpenAI. According to Bloomberg, OpenAI stated that it will also consult external experts as part of this process.
I’ll reserve my judgment for when OpenAI’s adopted recommendations are published, and I can see how they’re implemented, but intuitively, I don’t have the greatest confidence that OpenAI (or any major tech firm) is prioritizing safety and ethics as much as they are trying to win the AI race.
That’s a shame, and it’s unfortunate that generally speaking, those who are striving to be the best no matter what are often slow to consider the cost and effects of their actions, and how they might impact others in a very real way – even if large numbers of people are potentially going to be affected.
I’ll be happy to be proven wrong and I hope I am, and in an ideal world, all tech companies, whether they’re in the AI race or not, should prioritize the ethics and safety of what they’re doing at the same level that they strive for innovation. So far in the realm of AI, that does not appear to be the case from where I’m standing, and unless there are real consequences, I don’t see companies like OpenAI being swayed that much to change their overall ethos or behavior.
OpenAI's high-profile run-in with Scarlett Johansson is turning into a sci-fi story to rival the move Her, and now it's taken another turn, with OpenAI sharing documents and an updated blog post suggesting that its 'Sky' chatbot in the ChatGPT app wasn't a deliberate attempt to copy the actress's voice.
OpenAI preemptively pulled its 'Sky' voice option in the ChatGPT app on May 19, just before Scarlett Johansson publicly expressed her “disbelief” at how “eerily similar” it sounded to her own (in a statement shared with NPR). The actress also revealed that OpenAI CEO Sam Altman had previously approached her twice to license her voice for the app, and that she'd declined on both occasions.
But now OpenAI is on the defensive, sharing documents with The Washington Post suggesting that its casting process for the various voices in the ChatGPT app was kept entirely separate from its reported approaches to Johansson.
The documents, recordings and interviews with people involved in the process suggest that “an actress was hired to create the Sky voice months before Altman contacted Johansson”, according to The Washington Post.
The agent of the actress chosen for the Sky voice also apparently confirmed that “neither Johansson nor the movie “Her” were ever mentioned by OpenAI” during the process, nor was the actress's natural speaking voice tweaked to sound more like Johansson.
OpenAI's lead for AI model behavior, Joanne Jang, also shared more details with The Washington Post on how the voices were cast. Jang stated that she “kept a tight tent” around the AI voices project and that Altman was “not intimately involved” in the decision-making process, as he was “on his world tour during much of the casting process”.
With Johansson now reportedly lawyering up in her battle with OpenAI, the case looks likely to continue for some time.
Interestingly, the case isn't completely without precedent, despite the involvement of new tech. As noted by Mitch Glazier (chief executive of the Recording Industry Association of America), there was a similar case in the 1980s involving Bette Midler and the Ford Motor Company.
After Midler declined Ford's request to use her voice in a series of ads, Ford hired an impersonator instead – which resulted in a legal battle that Midler ultimately won, after a US court found that her voice was distinctive and should be protected against unauthorized use.
OpenAI is now seemingly distancing itself from suggestions that it deliberately did something similar with Johansson in its ChatGPT app, highlighting that its casting process started before Altman's apparent approaches to the actress.
This all follows an update to OpenAI's blog post, which included a statement from CEO Sam Altman claiming: “The voice of Sky is not Scarlett Johansson's, and it was never intended to resemble hers. We cast the voice actor behind Sky’s voice before any outreach to Ms. Johansson. Out of respect for Ms. Johansson, we have paused using Sky’s voice in our products. We are sorry to Ms. Johansson that we didn’t communicate better.”
But Altman's post on X (formerly Twitter) just before OpenAI's launch of GPT-4o, which simply stated “her”, doesn't help distance the company from suggestions that it was attempting to recreate the famous movie in some form, regardless of how explicit that was in its casting process.
OpenAI just announced its new GPT-4o (‘o’ for ‘omni’) model which combines text, video, and audio processing in real-time to answer questions, hold better conversations, solve maths problems, and more. It’s the most ‘human’-like iteration of the large language model (LLM) so far, available to all users for free shortly. GPT-4o has launched with a macOS app for ChatGPT Plus subscribers to try – but interestingly, there’s no Windows app just yet.
A blog post from OpenAI specifies that the company “plan[s] to launch a Windows version later this year,” choosing instead to offer the tech to Mac users first. This is odd, considering Microsoft has pumped billions of dollars into OpenAI and has its own OpenAI-powered digital assistant, Copilot. So, you would think the platform to receive initial exclusive access to a groundbreaking bit of tech like GPT-4o would be Microsoft Windows.
Why do things this way around? One theory floated by Windows Latest is that this could be a clever move on OpenAI’s part as Apple users might prefer a native app over a web app compared to Windows users. As an Apple user, I would indeed prefer to have an app for something I might use as regularly as GPT-4o, rather than having to navigate a web app – so perhaps other Apple fans may feel the same.
A further consideration here is with AI Explorer incoming as the big feature for Windows 11 later this year (in the 24H2 update), Microsoft may not want another feature like GPT-4o muddying the AI waters in its desktop OS.
Jumping in before Apple can
With such a jump between the public version of ChatGPT and the new GPT-4o model (which is also set to be available for free, albeit with limited use), OpenAI will surely want as many people using its product as possible. So, venturing into macOS territory makes sense if the firm wants to tap into a group of people who haven’t gravitated to its AI naturally.
So far Apple has not made any great efforts to integrate AI tools into its operating system in the same way that Microsoft has Copilot embedded into a user’s desktop taskbar. That leaves OpenAI with the perfect opportunity to jump onto the desktops of Mac users and show off what GPT-4o can do before Apple gets the chance to introduce its own AI assistant for macOS – if it does so.
We'll have to wait for WWDC to find out if Apple has its own take on the Copilot concept ready or if Mac users interested in artificial intelligence tools will find a new bestie in GPT-4o. That’s not to say I wouldn’t eat up whatever Apple has up its sleeve for Mac users – just that swapping over may be a little harder once I’m used to the way GPT-4o for Mac works for me.
OpenAI just held its eagerly-anticipated spring update event, making a series of exciting announcements and demonstrating the eye- and ear-popping capabilities of its newest GPT AI models. There were changes to model availability for all users, and at the center of the hype and attention: GPT-4o.
Coming just 24 hours before Google I/O, the launch puts Google's Gemini in a new perspective. If GPT-4o is as impressive as it looked, Google and its anticipated Gemini update better be mind-blowing.
What's all the fuss about? Let's dig into all the details of what OpenAI announced.
1. The announcement and demonstration of GPT-4o, and that it will be available to all users for free
The biggest announcement of the stream was the unveiling of GPT-4o (the 'o' standing for 'omni'), which combines audio, visual, and text processing in real time. Eventually, this version of OpenAI's GPT technology will be made available to all users for free, with usage limits.
For now, though, it's being rolled out to ChatGPT Plus users, who will get up to five times the messaging limits of free users. Team and Enterprise users will also get higher limits and access to it sooner.
GPT-4o will have GPT-4's intelligence, but it'll be faster and more responsive in daily use. Plus, you'll be able to provide it with or ask it to generate any combination of text, image, and audio.
The stream saw Mira Murati, Chief Technology Officer at OpenAI, and two researchers, Mark Chen and Barret Zoph, demonstrate GPT-4o's real-time responsiveness in conversation while using its voice functionality.
The demo began with a conversation about Chan's mental state, with GPT-4o listening and responding to his breathing. It then told a bedtime story to Barret with increasing levels of dramatics in its voice upon request – it was even asked to talk like a robot.
It continued with a demonstration of Barret “showing” GPT-4o a mathematical problem and the model guiding Barret through solving it by providing hints and encouragement. Chan asked why this specific mathematical concept was useful, which it answered at length.
They followed this up by showing GPT-4o some code, which it explained in plain English, and provided feedback on the plot that the code generated. The model talked about notable events, the labels of the axis, and a range of inputs. This was to show OpenAI's continued conviction to improving GPT models' interaction with code bases and the improvement of its mathematical abilities.
The penultimate demonstration was an impressive display of GPT-4o's linguistic abilities, as it simultaneously translated two languages – English and Italian – out loud.
Lastly, OpenAI provided a brief demo of GPT-4o's ability to identify emotions from a selfie sent by Barret, noting that he looked happy and cheerful.
If the AI model works as demonstrated, you'll be able to speak to it more naturally than many existing generative AI voice models and other digital assistants. You'll be able to interrupt it instead of having a turn-based conversation, and it'll continue to process and respond – similar to how we speak to each other naturally. Also, the lag between query and response, previously about two to three seconds, has been dramatically reduced.
ChatGPT equipped with GPT-4o will roll out over the coming weeks, free to try. This comes a few weeks after Open AI made ChatGPT available to try without signing up for an account.
2. Free users will have access to the GPT store, the memory function, the browse function, and advanced data analysis
GPTs are custom chatbots created by OpenAI and ChatGPT Plus users to help enable more specific conversations and tasks. Now, many more users can access them in the GPT Store.
Additionally, free users will be able to use ChatGPT's memory functionality, which makes it a more useful and helpful tool by giving it a sense of continuity. Also being added to the no-cost plan are ChatGPT's vision capabilities, which let you converse with the bot about uploaded items like images and documents. The browse function allows you to search through previous conversations more easily.
ChatGPT's abilities have improved in quality and speed in 50 languages, supporting OpenAI’s aim to bring its powers to as many people as possible.
3. GPT-4o will be available in API for developers
OpenAI's latest model will be available for developers to incorporate into their AI apps as a text and vision model. The support for GPT-4o's video and audio abilities will be launched soon and offered to a small group of trusted partners in the API.
4. The new ChatGPT desktop app
OpenAI is releasing a desktop app for macOS to advance its mission to make its products as easy and frictionless as possible, wherever you are and whichever model you're using, including the new GPT-4o. You’ll be able to assign keyboard shortcuts to do processes even more quickly.
According to OpenAI, the desktop app is available to ChatGPT Plus users now and will be available to more users in the coming weeks. It sports a similar design to the updated interface in the mobile app as well.
5. A refreshed ChatGPT user interface
ChatGPT is getting a more natural and intuitive user interface, refreshed to make interaction with the model easier and less jarring. OpenAI wants to get to the point where people barely focus on the AI and for you to feel like ChatGPT is friendlier. This means a new home screen, message layout, and other changes.
6. OpenAI's not done yet
The mission is bold, with OpenAI looking to demystify technology while creating some of the most complex technology that most people can access. Murati wrapped up by stating that we will soon be updated on what OpenAI is preparing to show us next and thanking Nvidia for providing the most advanced GPUs to make the demonstration possible.
OpenAI is determined to shape our interaction with devices, closely studying how humans interact with each other and trying to apply its learnings to its products. The latency of processing all of the different nuances of interaction is part of what dictates how we behave with products like ChatGPT, and OpenAI has been working hard to reduce this. As Murati puts it, its capabilities will continue to evolve, and it’ll get even better at helping you with exactly what you’re doing or asking about at exactly the right moment.
OpenAI has announced it's got news to share via a public livestream on Monday, May 13 – but, contrary to previous rumors, the developer of ChatGPT and Dall-E apparently isn't going to use the online event to launch a search engine.
In a social media post, OpenAI says that “some ChatGPT and GPT-4 updates” will be demoed at 10am PT / 1pm ET / 6pm BST on Monday May 13 (which is Tuesday, May 14 at 3am AEST for those of you in Australia). A livestream is going to be available.
OpenAI CEO Sam Altman followed up by saying the big reveal isn't going to be GPT-5 and isn't going to be a search engine, so make of that what you will. “We've been hard at work on some new stuff we think people will love,” Altman says. “Feels like magic to me.”
Rumors that OpenAI would be taking on Google directly with its own search engine, possibly developed in partnership with Microsoft and Bing, have been swirling for months. It sounds like it's not ready yet though – so we'll have to wait.
OpenAI, Google, and Apple
not gpt-5, not a search engine, but we’ve been hard at work on some new stuff we think people will love! feels like magic to me.monday 10am PT. https://t.co/nqftf6lRL1May 10, 2024
See more
AI chatbots such as Microsoft Copilot already do a decent job of pulling up information from the web – indeed, at their core, these Large Language Models (LLMs) are essentially training themselves on websites in a similar way to how Google indexes them.
It's possible that the future of web search is not a list of links but rather an answer from an AI, based on those links – which raises the question of how websites could carry on getting the revenue they need to supply LLMs with information in the first place. Google itself has also been experimenting with AI in its search results.
In other OpenAI news, according to Mark Gurman at Bloomberg, Apple has “closed in” on a deal to inject some ChatGPT smarts into iOS 18, due later this year. The companies are apparently now “finalizing terms” on the deal.
However, Gurman says that a deal between Apple and Google to use Google's Gemini AI engine is still on the table too. We know that Apple is planning to go big on AI this year, though it sounds as though it may need some help along the way.
You’ve probably noticed a few AI-generated images sprinkled throughout your different social media feeds – and there are likely a few you’ve probably scrolled right past, that may have slipped your keen eyes.
For those of us who have been immersed in the world of generative AI, spotting AI images is a little easier, as you develop a mental checklist of what to look out for.
However, as the technology gets better and better, it is going to get a lot harder to tell. To solve this, OpenAI is developing new methods to track AI-generated images and prove what has and has not been artificially generated.
According to a blog post, OpenAI’s new proposed methods will add a tamper-resistant ‘watermark’ that will tag content with invisible ‘stickers.’ So, if an image is generated with OpenAI’s DALL-E generator, the classifier will flag it even if the image is warped or saturated.
The blog post claims the tool will have around a 98% accuracy when spotting images made with DALL-E. However, it will only flag 5-10% of pictures from other generators like Midjourney or Adobe Firefly.
So, it’s great for in-house images, but not so great for anything produced outside of OpenAI. While it may not be as impressive as one would hope in some respects, it’s a positive sign that OpenAI is starting to address the flood of AI images that are getting harder and harder to distinguish.
Okay, so this may not seem like a big deal to some, as a lot of instances of AI-generated images are either memes or high-concept art that are pretty harmless. But that said, there’s also a surge of scenarios now where people are creating hyper-realistic fake photos of politicians, celebrities, people in their lives, and more besides, that could lead to misinformation being spread at an incredibly fast pace.
Hopefully, as these kinds of countermeasures get better and better, the accuracy will only improve, and we can have a much more accessible way to double-check the authenticity of the images we come across in our day-to-day life.