Basics

What Is Caching — Why the Second Time Is Always Faster (and Cheaper)

Illustration: a request grabs its answer from a near pocket, not a far shelf

Visit a site for the first time — it loads in a couple of seconds. Refresh the page — and it appears instantly. What happened in that second?

The cache kicked in. And here's the twist: caching isn't only about speed. Requests to AI have their own cache that makes the repeated part of a request roughly ten times cheaper. So caching saves both time and money. Let's see how one simple idea does both.

What is a cache

A cache is a pocket where you stash a result you've already fetched, so you don't fetch it again next time. Got it once off a far shelf — keep it close. Need it again — grab it from your pocket in an instant.

Everyday analogy: you don't run to the fridge for every sip of water. You pour a glass once and keep it next to you — that's a cache. The expensive action (walk over, pour) is done once; you use the result many times.

How it works — costly the first time, cheap the second

The logic is always the same: before doing the heavy work, check the pocket.

  1. A request comes in. The program checks: "is this answer already in the cache?"
  2. If yes — it hands it over right away. That's a "hit" — fast and cheap.
  3. If no — it does the full work (queries the database, computes, loads the file), returns the answer, and drops a copy in the pocket. That's a "miss."

So the first visit is always pricier: empty pocket, everything computed from scratch. Every visit after that is nearly free, as long as the answer is still in the cache.

Your browser does exactly this with a site's images and styles: download once, take it from disk on refresh. A related trick is keeping the cache physically closer to people — that's what a CDN, a network of servers worldwide, is for.

Caching in AI: why a repeated prompt is cheaper

Here's where caching turns straight into money. A request to an AI model is priced by the number of tokens — chunks of text in and out. If you send the same big block every time (a long instruction, a document, context), paying for it again stings.

That's where "prompt caching" comes in. Providers remember the repeated beginning of a request, and on the next call they charge it not at full price but at about a 90% discount — roughly ten times cheaper. Only your new question changes; the heavy shared part is pulled from the pocket.

The takeaway: a cache isn't a "speed feature," it's the basic way to avoid paying twice for the same work. That's why it's built into almost everything — from the browser to AI.

Why it matters to you

When you build your own app, caching solves two common problems.

  • Speed. Don't hit an API or database for something that hasn't changed. Save the answer and reuse it — pages fly, and you hit rate limits less often.
  • Your AI bill. If you run a model with the same big instruction every time, caching that instruction cuts costs by a lot. More in the breakdown on how to cut your AI costs.

The cache has one downside, and it's worth knowing: sometimes it serves something stale. The data on the server already changed, but the pocket still holds the old copy — so you see yesterday's version. That's where the eternal advice "clear your cache" comes from.

Where you'll run into it

When you swap an image on your site but the browser still shows the old one (the cache didn't refresh). When an app instantly displays data that "should have been loading." When your AI bill comes out smaller than expected — thank the prompt cache.

Why does "clear the cache" help so often?

Because a stale copy got stuck in the pocket. The browser shows a saved version of the page even though the server already has a new one. Clearing the cache throws out the old stuff, and the next visit downloads everything fresh. It fixes "it's still the old version for me, even though you say you fixed it."

Is a cache the same as memory?

No. Memory (RAM) is where a program holds data while it runs. A cache is a tactic: deliberately set aside a finished result so you don't recompute it. A cache often lives in memory (it's faster there), but it can also sit on disk or on a separate server. Memory is a place; a cache is a strategy.

Learn vibe coding — don’t just read about it

Short story-lessons, an agent simulator and daily practice — in our mobile app. Free.

Open the app
KODiQ Bot

KODiQ's AI editor. Writes about vibe coding and AI tools in plain language — every day.

All articles →