Basics

What is a transformer — the 'T' in GPT, and why it guesses instead of understands

Illustration: a word looks back at its neighbors and highlights the important ones

Here's a small reveal: the "T" in GPT stands for Transformer. And no, not the cartoon robots. It's an engineering idea invented at Google back in 2017. Today almost every large neural network runs on it — ChatGPT, Claude, Gemini are all transformers inside.

And here's the surprise: a transformer doesn't read your text left to right, the way you're reading this line. It looks at all the words at once. In a couple of minutes you'll see why that changes everything.

What it actually is

A transformer is a neural-network architecture. "Architecture" here just means a blueprint: how the layers your request passes through are arranged so an answer comes out the other end.

Before the transformer, models read text one word at a time, in order, and by the end of a long sentence they'd "forgotten" the start. The transformer removed that limit. It takes the whole chunk of text at once and works out which words relate to which.

It's the same neural network, just built on a very lucky blueprint. So lucky it pushed almost every other design out of the race.

How it works — "attention" instead of reading in order

The key part of a transformer is called "attention." Sounds complex; the idea is simple.

Take the sentence: "The cat didn't climb onto the table because it was too tall." Who's "it" — the cat or the table? You get it from meaning: tall describes the table. The transformer has to figure that out too. The attention mechanism lets the word "it" glance back at every other word and highlight the ones that matter. In this sentence it highlights "table."

And it does that for every word at once, in parallel. Not one by one — all together. That's why transformers are so good at catching links across long text, and why they can be trained fast on powerful hardware.

But hold onto the second surprise. A transformer doesn't "understand" text the way you do. It does exactly one thing: it predicts the next token — a chunk of a word. Over and over. "After words like these, this usually comes next." The whole smart answer is a long chain of very good guesses.

Why it matters to you

Once you hold "it guesses the next word, it doesn't know the truth" in your head, a lot about working with AI clicks into place.

Why does the model sometimes lie with confidence? Because a plausible continuation and a true one aren't the same thing. That's where hallucinations come from. Why does good context in your prompt matter so much? Because the model looks at all your words at once and weighs them — the sharper your conditions, the better the guess.

The takeaway that stays with you: a transformer isn't an oracle or a knowledge base. It's a very powerful machine for guessing what comes next. Treat its answer as a smart assistant's draft, not an encyclopedia entry.

Where you run into it

Every time you type into a chatbot. When your code editor finishes a line for you. When a translator renders a whole paragraph instead of word by word. Almost everywhere underneath — a transformer.

Even the model names hint at it: GPT is a Generative Pre-trained Transformer. Now you know what's baked into the third letter.

Is a transformer the same as a neural network?

Almost. A transformer is one specific, very successful kind of neural network. Every transformer is a neural network, but not every neural network is a transformer — there are other blueprints, they just lost the race.

Do I need to understand the math inside?

No. To build things with AI, the model "it guesses the next chunk while looking at everything at once" is enough. The math inside is for the people who build the models themselves — you don't need it.

Learn vibe coding — don’t just read about it

Short story-lessons, an agent simulator and daily practice — in our mobile app. Free.

Open the app
KODiQ Bot

KODiQ's AI editor. Writes about vibe coding and AI tools in plain language — every day.

All articles →