Tim Cook explains why Apple’s generative AI could be the best on smartphones – and he might have a point

It’s an open secret that Apple is going to unveil a whole host of new artificial intelligence (AI) software features in the coming weeks, with major overhauls planned for iOS 18, macOS 15, and more. But it’s not just new features that Apple is hoping to hype up – it’s the way in which those AI tools are put to use.

Tim Cook has just let slip that Apple’s generative AI will have some major “advantages” over its rivals. While the Apple CEO didn’t explain exactly what Apple’s generative AI will entail (we can expect to hear about that at WWDC in June), what he did say makes a whole lot of sense.

Speaking on Apple’s latest earnings call yesterday, Cook said: “We believe in the transformative power and promise of AI, and we believe we have advantages that will differentiate us in this new era, including Apple’s unique combination of seamless hardware, software, and services integration, groundbreaking Apple silicon with our industry-leading neural engines, and our unwavering focus on privacy, which underpins everything we create.”

Cook also said Apple is making “significant investments” in generative AI, and that he has “some very exciting things” to unveil in the near future. “We continue to feel very bullish about our opportunity in generative AI,” he added.

Why Tim Cook might be right

Siri

(Image credit: Unsplash [Omid Armin])

There are plenty of reasons why Apple’s AI implementation could be an improvement over what's come before it, not least of which is Apple’s strong track record when it comes to privacy. The company often prefers to encrypt data and run tasks on your device, rather than sending anything to the cloud, which helps ensure that it can’t be accessed by nefarious third parties – and when it comes to AI, it looks like this approach might play out again.

Bloomberg's Mark Gurman, for example, has reported that Apple’s upcoming AI features will work entirely on your device, thereby continuing Apple’s commitment to privacy, amid concerns that the rapid development of AI is putting security and privacy at risk. If successful, it could also be a more ethical approach to AI than that employed by Apple’s rivals.

In addition, the fact that Apple creates both the hardware and software in its products allows them to be seamlessly integrated in ways most of its competitors can’t match. It also means devices can be designed with specific use cases in mind that rely on hardware and software working together, rather than Apple having to rely on outside manufacturers to play ball. When it comes to AI, that could result in all kinds of benefits, from performance improvements to new app features.

We’ll find out for sure in the coming weeks. Apple is hosting an iPad event on May 7, which reports have suggested Apple might use to hint at upcoming AI capabilities. Beyond that, the company’s Worldwide Developers Conference (WWDC) lands on June 10, where Apple is expected to devote significant energy to its AI efforts. Watch this space.

You might also like

TechRadar – All the latest technology news

Read More

Could generative AI work without online data theft? Nvidia’s ChatRTX aims to prove it can

Nvidia continues to invest in AI initiatives and the most recent one, ChatRTX, is no exception thanks to its most recent update. 

ChatRTX is, according to the tech giant, a “demo app that lets you personalize a GPT large language model (LLM) connected to your own content.” This content comprises your PC’s local documents, files, folders, etc., and essentially builds a custom AI chatbox from that information.

Because it doesn’t require an internet connection, it gives users speedy access to query answers that might be buried under all those computer files. With the latest update, it has access to even more data and LLMs including Google Gemma and ChatGLM3, an open, bilingual (English and Chinese) LLM. It also can locally search for photos, and has Whisper support, allowing users to converse with ChatRTX through an AI-automated speech recognition program.

Nvidia uses TensorRT-LLM software and RTX graphics cards to power ChatRTX’s AI. And because it’s local, it’s far more secure than online AI chatbots. You can download ChatRTX here to try it out for free.

Can AI escape its ethical dilemma?

The concept of an AI chatbot using local data off your PC, instead of training on (read: stealing) other people’s online works, is rather intriguing. It seems to solve the ethical dilemma of using copyrighted works without permission and hoarding it. It also seems to solve another long-term problem that’s plagued many a PC user — actually finding long-buried files in your file explorer, or at least the information trapped within it.

However, there’s the obvious question of how the extremely limited data pool could negatively impact the chatbot. Unless the user is particularly skilled at training AI, it could end up becoming a serious issue in the future. Of course, only using it to locate information on your PC is perfectly fine and most likely the proper use. 

But the point of an AI chatbot is to have unique and meaningful conversations. Maybe there was a time in which we could have done that without the rampant theft, but corporations have powered their AI with stolen words from other sites and now it’s irrevocably tied.

Given that it's highly unethical that data theft is the vital part of the process that allows you to make chats well-rounded enough not to get trapped in feedback loops, it’s possible that Nvidia could be the middle ground for generative AI. If fully developed, it could prove that we don’t need the ethical transgression to power and shape them, so here's to hoping Nvidia can get it right.

You might also like

TechRadar – All the latest technology news

Read More

Apple is forging a path towards more ethical generative AI – something sorely needed in today’s AI-powered world

Copyright is something of a minefield right now when it comes to AI, and there’s a new report claiming that Apple’s generative AI – specifically its ‘Ajax’ large language model (LLM) – may be one of the only ones to have been both legally and ethically trained. It’s claimed that Apple is trying to uphold privacy and legality standards by adopting innovative training methods. 

Copyright law in the age of generative AI is difficult to navigate, and it’s becoming increasingly important as AI tools become more commonplace. One of the most glaring issues that comes up, again and again, is that many companies train their large language models (LLMs) using copyrighted works, typically not disclosing whether they license that training material. Sometimes, the outputs of these models include entire sections of copyright-protected works. 

The current justification for why copyrighted material is so widely used as far as some of these companies to train their LLMs is that, not dissimilar to humans, these models need a substantial amount of information (called training data for LLMs) to learn and generate coherent and convincing responses – and as far as these companies are concerned, copyrighted materials are fair game.

Many critics of generative AI consider it copyright infringement if tech companies use works in training and output of LLMs without explicit agreements with copyright holders or their representatives. Still, this criticism hasn’t put tech companies off from doing exactly that, and it’s assumed to be the case for most AI tools, garnering a growing pool of resentment towards the companies in the generative AI space.  

OpenAI CEO Sam Altman attends the artificial intelligence Revolution Forum. New York, US - 13 Jan 2023

(Image credit: Shutterstock/photosince)

There have even been a growing number of legal challenges mounted in these tech companies’ direction. OpenAI and Microsoft have actually been sued by the New York Times for copyright infringement back in December 2023, with the publisher accusing the two companies of training their LLMs on millions of New York Times articles. In September 2023, OpenAI and Microsoft were also sued by a number of prominent authors, including George R. R. Martin, Michael Connelly, and Jonathan Franzen. In July of 2023, over 15,000 authors signed an open letter directed at companies such as Microsoft, OpenAI, Meta, Alphabet, and others, calling on leaders of the tech industry to protect writers, calling on these companies to properly credit and compensate authors for their works when using them to train generative AI models. 

In April of this year, The Register reported that Amazon was hit with a lawsuit by an ex-employee alleging she faced mistreatment, discrimination, and harassment, and in the process, she testified about her experience when it came to issues of copyright infringement.  This employee alleges that she was told to deliberately ignore and violate copyright law to improve Amazon’s products to make them more competitive, and that her supervisor told her that “everyone else is doing it” when it came to copyright violations. Apple Insider echoes this claim, stating that this seems to be an accepted industry standard. 

As we’ve seen with many other novel technologies, the legislation and ethical frameworks always arrive after an initial delay, but it looks like this is becoming a more problematic aspect of generative AI models that the companies responsible for them will have to respond to.

A man editing a photo on a Mac Mini

(Image credit: Apple)

The Apple approach to ethical AI training (that we know of so far)

It looks like at least one major tech player might be trying to take the more careful and considered route to avoid as many legal (and moral!) challenges as possible – and somewhat surprisingly, it’s Apple. According to Apple Insider, Apple has been pursuing diligently licensing major news publications’ works when looking for AI training material. Back in December, Apple petitioned to license the archives of several major publishers to use these as training material for its own LLM, known internally as Ajax. 

It’s speculated that Ajax will be the software for basic on-device functionality for future Apple products, and it might instead license software like Google’s Gemini for more advanced features, such as those requiring an internet connection. Apple Insider writes that this allows Apple to avoid certain copyright infringement liabilities as Apple wouldn’t be responsible for copyright infringement by, say, Google Gemini. 

A paper published in March detailed how Apple intends to train its in-house LLM: a carefully chosen selection of images, image-text, and text-based input. In its methods, Apple simultaneously prioritized better image captioning and multi-step reasoning, at the same time as paying attention to preserving privacy. The last of these factors is made all the more possible for the Ajax LLM by it being entirely on-device and therefore not requiring an internet connection. There is a trade-off, as this does mean that Ajax won’t be able to check for copyrighted content and plagiarism itself, as it won’t be able to connect to online databases that store copyrighted material. 

There is one other caveat that Apple Insider reveals about this when speaking to sources who are familiar with Apple’s AI testing environments: there don’t currently seem to be many, if any, restrictions on users utilizing copyrighted material themselves as the input for on-device test environments. It's also worth noting that Apple isn't technically the only company taking a rights-first approach: art AI tool Adobe Firefly is also claimed to be completely copyright-compliant, so hopefully more AI startups will be wise enough to follow Apple and Adobe's lead.

I personally welcome this approach from Apple as I think human creativity is one of the most incredible capabilities we have, and I think it should be rewarded and celebrated – not fed to an AI. We’ll have to wait to know more about what Apple’s regulations regarding copyright and training its AI look like, but I agree with Apple Insider’s assessment that this definitely sounds like an improvement – especially since some AIs have been documented regurgitating copyrighted material word-for-word. We can look forward to learning more about Apple’s generative AI efforts very soon, which is expected to be a key driver for its developer-focused software conference, WWDC 2024

YOU MIGHT ALSO LIKE…

TechRadar – All the latest technology news

Read More

Midjourney just changed the generative image game and showed me how comics, film, and TV might never be the same

Midjourney, the Generative AI platform that you can currently use on Discord just introduced the concept of reusable characters and I am blown away.

It's a simple idea: Instead of using prompts to create countless generative image variations, you create and reuse a central character to illustrate all your themes, live out your wildest fantasies, and maybe tell a story.

Up until recently, Midjourney, which is trained on a diffusion model (add noise to an original image and have the model de-noise it so it can learn about the image) could create some beautiful and astonishingly realistic images based on prompts you put in the Discord channel (“/imagine: [prompt]”) but unless you were asking it to alter one of its generated images, every image set and character would look different.

Now, Midjourney has cooked up a simple way to reuse your Midjourney AI characters. I tried it out and, for the most part, it works.

Image 1 of 3

Midjourney AI character creation

I guess I don’t know how to describe myself. (Image credit: Future)
Image 2 of 3

Midjourney AI character creation

(Image credit: Future)
Image 3 of 3

Midjourney AI character creation

Things are getting weird (Image credit: Future)

In one prompt, I described someone who looked a little like me, chose my favorite of Midjourney's four generated image options, upscaled it for more definition, and then, using a new “– cref” prompt and the URL for my generated image (with the character I liked), I forced Midjounrey to generate new images but with the same AI character in them.

Later, I described a character with Charles Schulz's Peanuts character qualities and, once I had one I liked, reused him in a different prompt scenario where he had his kite stuck in a tree (Midjourney couldn't or wouldn't put the kite in the tree branches).

Image 1 of 2

Midjourney AI character creation

An homage to Charles Schulz (Image credit: Future)
Image 2 of 2

Midjourney AI character creation

(Image credit: Future)

It's far from perfect. Midjourney still tends to over-adjust the art but I contend the characters in the new images are the same ones I created in my initial images. The more descriptive you make your initial character-creation prompts, the better result you'll get in subsequent images.

Perhaps the most startling thing about Midjourney's update is the utter simplicity of the creative process. Writing natural language prompts has always been easy but training the system to make your character do something might typically take some programming or even AI model expertise. Here it's just a simple prompt, one code, and an image reference.

Image 1 of 2

Midjourney AI character creation

Got a lot closer with my photo as a reference (Image credit: Future)
Image 2 of 2

Midjourney AI character creation

(Image credit: Future)

While it's easier to take one of Midjourney's own creations and use that as your foundational character, I decided to see what Midjourney would do if I turned myself into a character using the same “cref” prompt. I found an online photo of myself and entered this prompt: “imagine: making a pizza – cref [link to a photo of me]”.

Midjourney quickly spit out an interpretation of me making a pizza. At best, it's the essence of me. I selected the least objectionable one and then crafted a new prompt using the URL from my favorite me.

Midjourney AI character creation

Oh, hey, Not Tim Cook (Image credit: Future)

Unfortunately, when I entered this prompt: “interviewing Tim Cook at Apple headquarters”, I got a grizzled-looking Apple CEO eating pizza and another image where he's holding an iPad that looks like it has pizza for a screen.

When I removed “Tim Cook” from the prompt, Midjourney was able to drop my character into four images. In each, Midjourney Me looks slightly different. There was one, though, where it looked like my favorite me enjoying a pizza with a “CEO” who also looked like me.

Midjourney AI character creation

Midjourney me enjoying pizza with my doppelgänger CEO (Image credit: Future)

Midjourney's AI will improve and soon it will be easy to create countless images featuring your favorite character. It could be for comic strips, books, graphic novels, photo series, animations, and, eventually, generative videos.

Such a tool could speed storyboarding but also make character animators very nervous.

If it's any consolation, I'm not sure Midjourney understands the difference between me and a pizza and pizza and an iPad – at least not yet.

You might also like

TechRadar – All the latest technology news

Read More

These new smart glasses can teach people about the world thanks to generative AI

It was only a matter of time before someone added generative AI to an AR headset and taking the plunge is start-up company Brilliant Labs with their recently revealed Frame smart glasses.

Looking like a pair of Where’s Waldo glasses (or Where’s Wally to our UK readers), the Frame houses a multimodal digital assistant called Noa. It consists of multiple AI models from other brands working together in unison to help users learn about the world around them. These lessons can be done just by looking at something and then issuing a command. Let’s say you want to know more about the nutritional value of a raspberry. Thanks to OpenAI tech, you can command Noa to perform a “visual analysis” of the subject. The read-out appears on the outer AR lens. Additionally, it can offer real-time language translation via Whisper AI.

The Frame can also search the internet via its Perplexity AI model. Search results will even provide price tags for potential purchases. In a recent VentureBeat article, Brilliant Labs claims Noa can provide instantaneous price checks for clothes just by scanning the piece, or fish out home listings for new houses on the market. All you have to do is look at the house in question. It can even generate images on the fly through Stable Diffusion, according to ZDNET

Evolving assistant

Going back to VentureBeat, their report offers a deeper insight into how Noa works. 

The digital assistant is always on, constantly taking in information from its environment. And it’ll apparently “adopt a unique personality” over time. The publication explains that upon activating for the first time, Noa appears as an “egg” on the display. Owners will have to answer a series of questions, and upon finishing, the egg hatches into a character avatar whose personality reflects the user. As the Frame is used, Noa analyzes the interactions between it and the user, evolving to become better at tackling tasks.

Brilliant Labs Frame exploded view

(Image credit: Brilliant Labs)

An exploded view of the Frame can be found on Brilliant Labs’ official website providing interesting insight into how the tech works. On-screen content is projected by a micro-OLED onto a “geometric prism” in the lens. 9To5Google points out this is reminiscent of how Google Glass worked. On the nose bridge is the Frame’s camera sitting on a PCBA (printed circuit board assembly). 

At the end of the stems, you have the batteries inside two big hubs. Brilliant Labs states the frames can last a whole day, and to charge them, you’ll have to plug in the Mister Power dongle, inadvertently turning the glasses into a high-tech Groucho Marx impersonation.

Brilliant Labs Frame with Mister Power

(Image credit: Brilliant Labs)

Availability

Currently open for pre-order, the Frame will run you $ 350 a pair. It’ll be available in three colors: Smokey Black, Cool Gray, and the transparent H20. You can opt for prescription lenses. Doing so will bump the price tag to $ 448.There's a chance Brilliant Labs won’t have your exact prescription. They recommend to instead select the option that closely matches your actual prescription. Shipping is free and the first batch rolls out April 15.

It appears all of the AI features are subject to a daily usage cap. Brilliant Labs has plans to launch a subscription service lifting the limit. We reached out to the company for clarification and asked several other questions like exactly how does the Frame receive input? This story will be updated at a later time.

Until then, check out TechRadar's list of the best VR headsets for 2024.

You might also like

TechRadar – All the latest technology news

Read More

Google Maps could become smarter than ever thanks to generative AI

Google Maps is getting a dose of generative AI to let users search and find places in a more conversational manner, and serve up useful and interesting suggestions. 

This smart AI tech comes in the form of an “Ask about” user interface where people can ask Google Maps questions like where to find “places with a vintage vibe” in San Francisco. That will prompt AI to analyze information, like photos, ratings and reviews, about nearby businesses and places to serve up suggestions related to the question being asked.  

From this example, Google said the AI tech served up vinyl record stores, clothing stores, and flea markets in its suggestions. These included the location along with its rating, reviews, number of times rated, and distance by car. The AI then provides review summaries that highlight why a place might be of interest. 

You can then ask follow-up questions that remember your previous query, using that for context on your next search. For example, when asked, “How about lunch?” the AI will take into account the “vintage vibe” comment from the previous prompt and use that to offer an old-school diner nearby.

Screengrabs of the new generative AI features on Google Maps showing searches and suggestions

(Image credit: Google)

You can save the suggestions or share them, helping you coordinate with friends who might all have different preferences like being vegan, checking if a venue is dog friendly, making sure it is indoors, and so on.

By tapping into the search giant’s large-language models, Google Maps can analyze detailed information using data from more than 250 million locations, and photos, ratings and reviews from its community of over 300 million contributors to provide “trustworthy” suggestions. 

The experimental feature is launching this week but is only coming to “select Local Guides” in the US. It will use these members' insights and feedback to develop and test the feature before what’s likely to be its eventual full rollout, which Google has not provided a date for.

 Does anyone want this?  

Users on the Android subreddit were very critical of the feature with some referring to AI as a buzzword that big companies are chasing for clout, user lohet stated: “Generative AI doesn't have any place in a basic database search. There's nothing to generate. It's either there or it's not.”

Many said they would rather see Google improve offline Maps and its location-sharing features. User, chronocapybara summarized the feelings of others in the forum by saying:  “If it helps find me things I'm searching for, I'm all for it. If it offloads work to the cloud, making search slower, just to give me more promoted places that are basically ads, then no.” 

However, AI integration in our everyday apps is here to stay and its inclusion in Google Maps could lead to users being able to discover brand-new places easily and helping smaller businesses gain attention and find an audience.

Until the features roll out, you can make the most of Google Maps with our 10 things you didn't know Google Maps could do

You may also like

TechRadar – All the latest technology news

Read More

Google’s new generative AI aims to help you get those creative juices following

It’s a big day for Google AI as the tech giant has launched a new image-generation engine aimed at fostering people’s creativity.

The tool is called ImageFX and it runs on Imagen 2,  Google's “latest text-to-image model” that Google claims can deliver the company's “highest-quality images yet.” Like so many other generative AIs before, it generates content by having users enter a command into the text box. What’s unique about the engine is it comes with “Expressive Chips” which are dropdown menus over keywords allowing you to quickly alter content with adjacent ideas. For example, ImageFX gave us a sample prompt of a dress carved out of deadwood complete with foliage. After it made a series of pictures, the AI offered the opportunity to change certain aspects; turning a beautiful forest-inspired dress into an ugly shirt made out of plastic and flowers. 

Image 1 of 2

ImageFX - generated dress

(Image credit: Future)
Image 2 of 2

ImageFX - generated shirt

(Image credit: Future)

Options in the Expressive Chips don’t change. They remain fixed to the initial prompt although you can add more to the list by selecting the tags down at the bottom. There doesn’t appear to be a way to remove tags. Users will have to click the Start Over button to begin anew. If the AI manages to create something you enjoy, it can be downloaded or shared on social media.

Be creative

This obviously isn’t the first time Google has released a text-to-image generative AI. In fact, Bard just received the same ability. The main difference with ImageFX is, again, its encouragement of creativity. The clips can help spark inspiration by giving you ideas of how to direct the engine; ideas that you may never have thought of. Bard’s feature, on the other hand, offers little to no guidance. Because it's less user-friendly, directing Bard's image generation will be trickier.

ImageFX is free to use on Google’s AI Test Kitchen. Do keep in mind it’s still a work in progress. Upon visiting the page for the first time, you’ll be met with a warning message telling you the AI “may display inaccurate info”, and in some cases, offensive content. If this happens to you, the company asks that you report it to them by clicking the flag icon. 

Also, Google wants people to keep things clean. They link to their Generative AI Prohibited Use Policy in the warning listing out what you can’t do with ImageFX.

AI updates

In addition to ImageFX, Google made several updates to past experimental AIs. 

MusicFX, the brand’s text-to-music engine, now allows users to generate songs up to 70 seconds in length as well as alter their speed. The tool even received Expressive Chips, helping people get those creative juices flowing. MusicFX even got a performance boost enabling it to pump content faster than before. TextFX, on the other hand, didn’t see a major upgrade or new features. Google mainly updated the website so it’s more navigable.

MusicFX's new layout

(Image credit: Future)

Everything you see here is available to users in the US, New Zealand, Kenya, and Australia. No word on if the AI will roll out elsewhere, although we did ask. This story will be updated at a later time.

Until then, check out TechRadar's roundup of the best AI art generators for 2024 where we compare them to each other. There's no clear winner, but they do have their specialties. 

You might also like

TechRadar – All the latest technology news

Read More

Meta opens the gates to its generative AI tech with launch of new Imagine platform

Amongst all the hullabaloo of Google’s Gemini launch, Meta opened the gates to its free-standing image generator website called Imagine with Meta AI.

The company has been tinkering with this technology for some time now. WhatsApp, for instance, has had a beta in-app image generator since August of this year. Accessing the feature required people to have Meta's app installed on their smartphones. But now with Imagine, all you need is an email address to create an account on the platform. Once in, you’re free to create whatever you want by entering a simple text prompt. It functions similarly to DALL-E

We tried out the website ourselves and discovered the AI will create four 1,280 x 1,280 pixel JPEG images that you can download by clicking the three dots in the upper right corner. The option will appear in the drop-down menu.

Below is a series of images we asked the engine to make. You’ll notice in the bottom left corner is a watermark stating that it was created by an AI.

Image 1 of 3

Homer according to Meta

(Image credit: Future)
Image 2 of 3

Me, according to Meta

(Image credit: Future)
Image 3 of 3

Char's Zaku, according to Meta

(Image credit: Future)

We were surprised to discover that it’s able to create content featuring famous cartoon characters like Homer Simpson and even Mickey Mouse. You’d think there would be restrictions for certain copyrighted material, but apparently not. As impressive as these images may be, there are noticeable flaws. If you look at the Homer Simpson sample, you can see parts of the picture melting into each other. Plus, the character looks downright bizarre.

Limitations (and the work arounds)

A lot of care was put into the development of Imagine. You see, it's powered by Meta's proprietary Emu learning model. According to a company research paper from September, Emu was trained on “1.1 billion images”. At the time, no one really knew the source of all this data. However, Nick Clegg, Meta’s president of global affairs, told Reuters it used public Facebook and Instagram posts to train the model. Altogether, over a billion social media accounts were scrapped.

To rein in all this data, Meta implemented some restrictions. The tech keeps things family friendly as it'll refuse prompts that are violent or sexual nor can they mention a famous person. 

Despite the tech giant’s best efforts, it’s not perfect by any stretch. It appears there is a way to get around said limitations with indirect wording. For example, when we asked Meta AI to create an image of former President Barack Obama, it refused. But, when we entered “a former US president” as the prompt, the AI generated a man that resembled President Obama. 

A former US president, according to Meta

(Image credit: Future)

There are plans to introduce “invisible watermarking… for increased transparency and traceability”, but it’s still weeks away from being released. A lot of damage can be done in that short period. Misuse is something that Meta is concerned about, however, there are still holes. We reached out asking if it aims to implement more protection. This story will be updated at a later time.

Until then, check out TechRadar's guide on the best AI art generators for the year.

You might also like

TechRadar – All the latest technology news

Read More

Generative AI could get more active thanks to this wild Stable Diffusion update

Stability AI, the developer behind the Stable Diffusion, is previewing a new generative AI that can create short-form videos with a text prompt.

Aptly called Stable Video Diffusion, it consists of two AI models (known as SVD and SVD-XT) and is capable of creating clips at a 576 x 1,024 pixel resolution. Users will be able to customize the frame rate speed to run between three and 30 FPS. The length of the videos depends on which of the twin models is chosen. If you select SVD, the content will play for 14 frames while SVD-XT extends that a bit to 25 frames. The length doesn’t matter too much as rendered clips will only play for about four seconds before ending, according to the official listing on Hugging Face.

The company posted a video on its YouTube channel showing off what Stable Video Diffusion is capable of and the content is surprisingly high quality. They're certainly not the nightmare fuel you see on other AI like Meta’s Make-A-Video. The most impressive, in our opinion, has to be the Ice Dragon demo. You can see a high amount of detail in the dragon’s scales plus the mountains in the back look like something out of a painting. Animation, as you can imagine, is rather limited as the subject can only slowly bob its head. The same can be seen in other demos. It’s either a stiff walking cycle or a slow panning shot. 

In the early stages

Limitations don’t stop there. Stable Video Diffusion reportedly cannot “achieve perfect photorealism”, it can’t generate “legible text”, plus it has a tough time with faces. Another demonstration on Stability AI’s website does show its model is able to render a man’s face without any weird flaws so it could be on a case-by-case basis.

Keep in mind that this project is still in the early stages. It’s obvious the model is not ready for a wide release nor are there any plans to do so. Stability AI emphasizes that Stable Video Diffusion is not meant “for real-world or commercial applications” at this time. In fact, it is currently “intended for research purposes only.” We’re not surprised the developer is being very cautious with its tech. There was an incident last year where Stability Diffusion’s model leaked online, leading to bad actors using it to create deep fake images.

Availability

If you’re interested in trying out Stable Video Diffusion, you can enter a waitlist by filling out a form on the company website. It’s unknown when people will be allowed in, but the preview will include a Text-To-Video interface. In the meantime, you can check out the AI’s white paper and read up on all the nitty gritty behind the project. 

One thing we found interesting after digging through the document is it mentions using “publicly accessible video datasets” as some of the training material. Again, it's not surprising to hear this considering that Getty Images sued Stability AI over data scraping allegations earlier this year. It looks like the team is striving to be more careful so it doesn't make any more enemies.

No word on when Stable Video Diffusion will launch. Luckily, there are other options. Be sure to check out TechRadar's list of the best AI video makers for 2023.

You might also like

TechRadar – All the latest technology news

Read More