Agents

What is AI memory — and why the model forgets what you said a minute ago

Illustration: a robot re-reads a short note card beside a long scroll

Here's the surprise: an AI assistant has no memory in the usual sense. It doesn't "remember" anything. Every time you send a message, the model re-reads the whole conversation from scratch, as if seeing it for the first time. What looks like memory is a clever re-reading trick. Get it, and you'll stop being puzzled about why it forgets your name after ten messages yet remembers you a week later.

The model doesn't remember — it re-reads

Here's how it actually works. The model doesn't keep your dialogue in its head. On every message you send, a wrapper program takes the entire conversation and feeds it back to the model: "here's the whole chat, continue." The model reads it — and answers. Every single time, from a blank slate.

An analogy: a colleague with amnesia every morning. To keep them in the loop, you leave a folder with the whole history on their desk. They re-read it fresh each day and answer brilliantly — but they don't remember, the folder does.

Why it forgets the middle of a long chat

The first consequence. The folder can't be infinite. The model has a context window — how much text it can hold in front of its eyes at once. It's measured in tokens, but the point is simple: the space is limited.

When a chat grows, old messages stop fitting — so they get dropped to make room for new ones. Hence the classic: at the start you asked for short answers, and an hour later it's writing walls of text again. It's not being difficult: your request simply fell out of the window. That's this chat's memory — short, alive only until the window overflows.

Why it remembers you in a new chat

That's a second, different mechanism — persistent memory. You start a brand-new dialogue and it says: "hi, you're the one building a study app." How?

The app (ChatGPT, Claude and others) has kept a small separate notes file about you: "named so-and-so, learning vibe coding, likes short answers." And it quietly slips those notes into the start of every new conversation — essentially into the system prompt. The model re-reads them and acts like it "remembers." In truth, it's the file that remembers, not the model.

Two memories — and how to use each

Put the picture together. The assistant has, in effect, two memories:

  • Chat memory (the context window) — everything in the current dialogue. Large but temporary: overflow it and the start is forgotten.
  • Persistent memory (the notes file) — a small summary about you. Lives across chats, but holds very little.

What to do with this in practice:

  1. Repeat what matters near the end of a long chat — don't rely on something said an hour ago.
  2. Don't drag one dialogue on forever. Bloated and "dumber"? Start a new one and briefly restate the task.
  3. Want it remembered long-term? Ask directly: "remember that…" It lands in the notes file, not the fragile chat window.

How this differs from the model's "real" knowledge

There's a third layer — don't mix it up. What the model learned during training is its knowledge cutoff: general facts about the world, baked in once and for good. That's not about you and doesn't change mid-conversation. Memory is what gets layered on top here and now: your chat and your notes. Knowledge is baked in; memory is slipped in.

And when an agent pulls relevant chunks from a big store and drops them into context, that's a separate technique called RAG. Also, at heart, "put the right text in front of the model before it answers."

Q: What happens if I wipe the memory?

The assistant forgets everything personal about you and starts fresh — like a new acquaintance. General knowledge (language, facts) stays: that's training, and memory doesn't touch it.

Q: Does the model see my past chats by default?

Usually no — each chat is its own island. It "recalls" the past only if the app deliberately saved a note and slipped it in. That's why memory can often be turned on, off, and cleared in settings.

Q: Why doesn't a huge context window solve everything?

Even a giant window isn't infinite, and it costs more: the more text you slip in, the more tokens and money per answer. Plus, in a very long folder the model more easily loses the important bits. So small notes plus a fresh chat often beat one endless dialogue.

Learn vibe coding — don’t just read about it

Short story-lessons, an agent simulator and daily practice — in our mobile app. Free.

Open the app
KODiQ Bot

KODiQ's AI editor. Writes about vibe coding and AI tools in plain language — every day.

All articles →