Stable Diffusion and DALL·E 3 were two of the first big names in AI image generation models. And although the company behind Stable Diffusion has had quite a chaotic year, they're still both incredibly popular models, so it's worth knowing where they stand—and how they compare.
So, let's look at which image generator you should use: Stable Diffusion, DALL·E 3, or something else?
How do Stable Diffusion and DALL·E 3 work?
All the modern AI text-to-image models were trained on millions or billions of text-image pairs. This allows them to comprehend concepts like dogs, deerstalker hats, and dark moody lighting, and it's how they can understand what a prompt like "an impressionist oil painting of a Canadian man riding a moose through a forest of maple trees" is actually asking them.
For image generation, Stable Diffusion and DALL·E 3 both rely on a process called diffusion. The image generator starts with a random field of noise, and then edits it in a series of steps to match its interpretation of the prompt. By starting with a different set of random noise each time, they can create different results from the same prompt. It's kind of like looking up at a cloudy sky, finding a cloud that looks kind of like a dog, and then being able to snap your fingers to keep making it more and more dog-like.
Even though both models have similar technical underpinnings, there are plenty of differences between them.
Stability AI (the makers of Stable Diffusion) and OpenAI (the makers of DALL·E 3) have different philosophical approaches to how these kinds of AI tools should work. They were also trained on different data sets, with different design and implementation decisions made along the way. So although you can use both to do the same thing, they can give you totally different results.
Here's the prompt I mentioned above from Stable Diffusion:
And here it is from DALL·E 3:
Something else to keep in mind:
DALL·E 3 is only available through ChatGPT, the Bing Image Creator, Microsoft Paint, and other services using its API.
Stable Diffusion is actually a number of open source models. Stability AI, the company behind Stable Diffusion, is now pushing its own ChatGPT alternative called Stable Assistant. But you can access earlier versions of Stable Diffusion through most AI art generators and lots of other tools that have an integrated image generator. You can also download the latest version of Stable Diffusion, install it on your own computer, and even train it on your own data. (This is how many services like Lensa's AI avatars work.)
I'll dig into what this all means a little later, but for ease of comparison, I'll mostly be comparing the models as they're accessed through their most official web apps: ChatGPT for DALL·E 3 and Stable Assistant for Stable Diffusion.
Stable Diffusion vs. DALL·E 3 at a glance
Stable Diffusion and DALL·E 3 are built using similar technologies, but they differ in a few important ways. Here's a short summary of things, but read on for the details.
Stable Diffusion | DALL·E 3 | |
---|---|---|
Official web app | Stable Assistant | ChatGPT |
Quality | ⭐⭐⭐⭐⭐ Exceptional AI-generated images | ⭐⭐⭐⭐⭐ Exceptional AI-generated images |
Ease of use | ⭐⭐ Plenty of options and different models, which makes things complicated | ⭐⭐⭐⭐⭐ Collaborate with a chatbot |
Power and control | ⭐⭐⭐⭐ Specifics depend on how you use it, but Stable Diffusion gives you a lot of control | ⭐⭐⭐ You can ask the chatbot to make changes to the whole image or a specific area, but not a whole lot else |
Both make great AI-generated images
Let's get the big thing out of the way: both Stable Diffusion and DALL·E 3 are capable of producing incredible AI-generated images. I've had heaps of fun playing around with both models, and I've been shocked by how they've nailed certain prompts. I've also laughed quite hard at both their mess-ups. Really, neither model is objectively—or even subjectively—better than the other. At least not consistently.
If I was forced to highlight where the models can differ, I'd say that:
By default, Stable Diffusion tends toward more photorealistic images, though it can subtly mess up things like faces, while DALL·E 3 makes things that look more abstract or computer-generated.
DALL·E 3 feels better "aligned," so you may see less stereotypical results, and it's got stronger content filters so you can't make images of real people or illegal activity.
Though, again, the results you get really depend on what you ask for—and how much prompt engineering you're prepared to do.
DALL·E 3 is super easy to use
DALL·E 3 is incredibly simple to use. Open up ChatGPT, and so long as you're a ChatGPT Plus subscriber, you can chat away and make requests. There are even suggestions of different ideas and styles you can try if you need a little inspiration.
If you aren't a ChatGPT Plus subscriber, you can try DALL·E 3 through Bing Chat or Microsoft Image Creator. But I'm focusing on using it through ChatGPT here—it's the most consistent way with the most control.
Stable Diffusion is a lot messier
Out of the box, Stable Diffusion is a lot more confusing since it's a family of open models—not a singular model. The three most relevant to our discussion are:
Stable Diffusion Ultra or Stable Diffusion 3. SD 3 was released in 2024 with a bit of controversy. There was a newly restrictive license, it was considered worse than SDXL 1.0 (the previous model), and it was even banned on some of the top AI image generators. Stability AI has backtracked on the license terms, and it's available through the Stable Assistant chatbot, where I'm using it for this article.
Stable Diffusion XL. At one point, this was the best and most popular AI image generation model. It's still very popular on art generation platforms like NightCafe, where you can find lots of variations.
Stable Diffusion 1.5. This is an older Stable Diffusion model. Because it's smaller than other image models, it's cheap to fine-tune and is still available on art generation platforms.
To make things even more confusing, a few of the original Stable Diffusion researchers left Stability AI and founded Black Forest Labs. They recently launched a new family of open models called FLUX that are replacing Stable Diffusion as the go-to open text-to-image model.
All this means that while Stable Diffusion is still popular and widely available, its latest model isn't as successful as its predecessors. I'm using it through the utterly mediocre Stable Assistant chatbot (more on that later), but that's not how most people use Stable Diffusion.
So, even though you can sign up for Stable Assistant, type in a prompt, and get an image as good as anything produced by DALL·E 3, there's a lot of drama going on behind the scenes, which makes things more complicated.
Stable Diffusion is more powerful
For all its ease of use, DALL·E 3 doesn't give you all that many options. If you don't like the results, you can ask ChatGPT to try again, and it will tweak your prompt and try again.
Alternatively, you can use the select tool to highlight the areas of the image you want it to change.
And DALL·E 3 will do its best to incorporate your requests.
These editing features are a lot more powerful than they were when DALL·E 3 first launched, but you still can't incorporate your own images, expand a generated image, or make big changes without completely changing the image.
Even the Bing tools that use DALL·E 3 don't give you many more options. The only ones of note are that Image Creator allows you to import your image directly into Microsoft Designer, and Paint allows you to generate images in the app, so you can edit them (or at least paint over the top).
Stable Diffusion (in almost every iteration) gives you more options and control. You can add elements, replace things, expand your canvas, and basically just tweak your image a lot more with various different tools. Even the otherwise dreadful Stable Assistant offers more options than DALL·E 3.
Finally, if you want to build a generative AI that's custom-trained on specific data—such as your own face, logos, or anything else—you can do that far more readily with Stable Diffusion. This allows you to create an image generator that consistently produces a particular kind or style of image. The specifics of how you do this are far beyond the scope of this comparison, but the point is that this is something that Stable Diffusion is designed to do that isn't really possible with DALL·E 3—at least not without diving deep into configuring your own custom GPT, and even then, your options are far more limited.
Pricing makes things more confusing
DALL·E 3's pricing is super simple: it costs $20/month as part of ChatGPT Plus, or it's available for free as part of different Microsoft tools, though some of them will watermark your images.
Stable Diffusion is available at every price point. You can download it and run it yourself, or you can get free images through Clipdrop and the various art generator platforms. The easiest way to use the latest version of Stable Diffusion is through Stable Assistant, which starts at $9/month after a three-day free trial. Unfortunately, while it's decent as an image generator, it's pretty bad as a chatbot. It doesn't compete with ChatGPT, Claude, or Gemini.
All this is to say that Stable Diffusion is definitely worth checking out for free, but if you're prepared to pay for a chatbot, DALL·E 3 is the better option as ChatGPT is much better than Stable Assistant. (If you just care about the image generator, you might find some value in Stable Assistant, but I'd suggest checking out Midjourney instead.)
Commercial use is complicated for both
If you're planning to use Stable Diffusion or DALL·E 3 for commercial use, things get a bit complicated.
Commercial use is currently allowed by both models with a couple of caveats. With DALL·E 3, commercial use is allowed but not if you use it through Microsoft. Stable Diffusion's different models have different terms, but the latest license agreement allows unrestricted commercial use for businesses making less than $1 million in revenue. After that, you need a license. Still, even with commercial use allowed by the tools, the implications haven't been fully explored.
Purely from a creative standpoint, Stable Diffusion has a slight edge. Its model has fewer guardrails—and even less if you train one yourself—so you can create more kinds of content. DALL·E 3 won't allow you to create a huge amount of content, including images of public figures.
Stable Diffusion was happy to oblige.
DALL·E 3 vs. Stable Diffusion: Which should you use?
Neither DALL·E 3 nor Stable Diffusion is a clear winner when you compare them head-to-head. DALL·E 3 is easy to use and safe, though it costs $20/month through ChatGPT. If you use ChatGPT lots, it's well worth it, but it's a steep price just for the image generator.
Stable Diffusion, on the other hand, is kind of a mess at the moment. Stable Assistant is bad at being a chatbot, and while the older versions of Stable Diffusion are still available in lots of places, they're being replaced by FLUX. I'm not saying that you shouldn't try out Stable Diffusion in some form, but I'm not sure how much longer it will be considered a major image generator. There's a new CEO, and James Cameron has joined the board of Stability AI, so maybe they can right the ship?
Either way, the decision doesn't really come down to the quality of the generated output but rather what you're looking for. You can give both models a try—or check out Midjourney, FLUX, Ideogram, or any of the other image models out there. They can all create awesome, hilarious, and downright bizarre images from the right prompt. Give a few a try, and see which one you like best.
Related reading:
This article was originally published in May 2023. The most recent update was in October 2024.