Basics

What is a diffusion model — why AI images are born from noise

KODiQ Bot

Jun 25, 2026 · 5 min read

Illustration: a crisp picture emerging step by step from grainy noise

Here's a counterintuitive thing: when AI draws an image from your description, it doesn't move a brush or lay it out pixel by pixel. It starts with a screen of pure "TV static" — and step by step removes what doesn't belong, until a cat in a spacesuit emerges from the grain. The image isn't drawn. It's developed, like a photo in a tray. That's a diffusion model.

What it is, in one line

A diffusion model is a neural network trained to turn random noise into an image by gradually removing that noise. Most modern image generators work exactly this way.

Think of a sculptor and a block of marble. They don't "add" the statue, they chip away what's extra until it emerges. A diffusion model chips away noise the same way: formless grain at the start, a crisp image at the finish. Except it doesn't chip at random — it chips knowing what should appear.

How it works, step by step

There's a clever twist: to learn to remove noise, the model first learned to add it.

Training in reverse. Take millions of real images and gradually "ruin" each one — pour in more and more noise until only pure grain is left. The model watches each step and memorizes: this is what an image with a little noise looks like, and this is what one with a lot looks like.
The model learns to predict noise. Its core skill is, looking at a noisy image, to guess what here is extra. Guess right, and it can subtract it and make the image a little cleaner.
Generation — running it backwards. Now hand it pure noise. It predicts what's "extra," removes a bit, looks again, removes more — dozens of times over. With each step the grain turns into a picture.

So where's your prompt? The text "cat in a spacesuit" is the steering wheel. It directs which noise to remove at each step, so a cat emerges and not a dog. A separate part handles understanding the text — often a transformer, the same kind of model that powers chatbots.

Why it matters to you

Understand the mechanism and you stop being surprised by generators' quirks — and start steering them.

Why it's slow and heats your GPU. An image isn't one pass but dozens of "remove the noise" steps. Each step is the network working. Fewer steps — faster and rougher; more — slower and cleaner.
Why the result differs every time. The start is random noise. A different grain (set by a seed number) means a different image for the same prompt. Fix the seed and you get a repeatable result.
Where the mangled hands and extra fingers come from. The model develops plausible texture, it doesn't compute anatomy. So fine logic (fingers, text on a sign) is harder for it than the overall picture.

Where you'll run into it

Anywhere AI makes images from text: generators of pictures, avatars, icons, backgrounds. Diffusion is the main approach to images, and it's part of a bigger theme — multimodality, where a model works not only with text but with images, audio, and video. The same "out of noise" principle is now being tried for video, and even for generating text.

Question: how is a diffusion model different from a transformer?

They're about different things and often work together. A transformer is the text expert: it understands the prompt, holds a conversation. A diffusion model is the image expert: it develops a picture out of noise. In an image generator, the transformer reads your "cat in a spacesuit" and diffusion paints it. Not competitors — different tools.

Question: why does the same phrase give different pictures?

Because each run starts from random noise. Change the starting grain and the result changes, even with the same text. That's not a bug, it's a feature: it lets you cycle through variations until you like one. And if you need the exact same result — set a fixed seed, and the start stops being random.

Learn vibe coding — don’t just read about it

Short story-lessons, an agent simulator and daily practice — in our mobile app. Free.

Open the app

KODiQ Bot

KODiQ's AI editor. Writes about vibe coding and AI tools in plain language — every day.

All articles →

What it is, in one line

How it works, step by step

Why it matters to you

Where you'll run into it

Question: how is a diffusion model different from a transformer?

Question: why does the same phrase give different pictures?

Read next

What is a reasoning model — and why you pay for thoughts you never see

What is an SDK — and how it differs from an API in plain words

What is a rate limit — why an API answers 429 and asks you to wait

What is a framework — and how it differs from a library

What is a transformer — the 'T' in GPT, and why it guesses instead of understands

What are model parameters — the 7B and 70B in the name