What is a reasoning model — and why you pay for thoughts you never see

Here's an odd thing: some AI answers cost you several times more, even though the text on screen is the same length. And you wait longer for them. It's not the service being greedy — the model spends a while "thinking to itself" first, and those invisible thoughts get counted too. That's a reasoning model, and in a couple of minutes you'll know when it's worth it and when you're just overpaying.
What it is, in one line
A reasoning model (a model with a thinking mode) is a language model that drafts its working before answering, and shows you only the final result. A regular model replies right away, off the top of its head. A reasoning model runs the steps internally first.
Think of two students at the board. One blurts out the first answer that comes to mind. The other writes out the solution on scratch paper, checks themselves, catches their own mistake — and only then writes the result. You don't see the scratch paper. But it's what gets the answer right.
How it works, step by step
- You ask a hard question — one that needs reasoning, not just recall.
- The model generates "thoughts." These are the same tokens as normal text, just marked as internal (often hidden between tags like
<think>…</think>). There it breaks the task apart, tries approaches, checks intermediate steps. - Once the draft is ready, it writes a short final answer. That's the part you see.
The key trick: for the model, "thinking" is just generating more tokens before answering. The harder the task, the longer the draft. On a tough problem the thinking can run to tens of thousands of tokens while the visible answer is a couple hundred.
Why it matters to you (and what it costs)
The whole thing exists for two reasons. One of them is a trap.
- Accuracy jumps. On contest math, a regular model got around 12% right; the thinking model got about 74% on the same problems. That's not "a bit better," that's a different league.
- Cost and time go up. The internal thoughts are counted and billed too — even the ones you never see. The answer comes slower, the bill is bigger.
The trap is that thinking mode isn't always needed. Translating a phrase, tidying text into a list, pulling a date out of an email — a regular model does these just as accurately, but faster and cheaper. The thinking model only pays off on multi-step work: math, reading and debugging code, logic, planning. Running it for trivia is like writing out scratch work for "what's 2+2."
Where you'll run into it
In many services, thinking mode is a toggle or a separate model: "Thinking," "Reasoning," "extended thinking." In some lineups these are dedicated models (OpenAI's o-series); in others it's a switch on a normal one (extended thinking on Claude). The rule is simple: keep it off for chatter and quick edits, turn it on when the task genuinely needs thought. You'll save time and money without losing quality where it isn't needed.
Question: can you see what the model was thinking?
Usually not. Many services hide the "raw" thoughts and show only the result or a compressed summary of the reasoning. It's partly for convenience, partly because the draft is often long and messy. The one thing to keep in mind: those hidden tokens are real, and you're being billed for them.
Question: is a reasoning model always better?
No, and that's the big misconception. On simple tasks it's no more accurate — just slower and pricier. It can even be worse: the model "overthinks" the obvious and ties itself in knots. Choose by the task, not by "which one is smarter." Hard reasoning → thinking model; everything else → regular model.
Short story-lessons, an agent simulator and daily practice — in our mobile app. Free.





