What is AI memory — and why the model forgets what you said a minute ago

Here's the surprise: an AI assistant has no memory in the usual sense. It doesn't "remember" anything. Every time you send a message, the model re-reads the whole conversation from scratch, as if seeing it for the first time. What looks like memory is a clever re-reading trick. Get it, and you'll stop being puzzled about why it forgets your name after ten messages yet remembers you a week later.
The model doesn't remember — it re-reads
Here's how it actually works. The model doesn't keep your dialogue in its head. On every message you send, a wrapper program takes the entire conversation and feeds it back to the model: "here's the whole chat, continue." The model reads it — and answers. Every single time, from a blank slate.
An analogy: a colleague with amnesia every morning. To keep them in the loop, you leave a folder with the whole history on their desk. They re-read it fresh each day and answer brilliantly — but they don't remember, the folder does.
Why it forgets the middle of a long chat
The first consequence. The folder can't be infinite. The model has a context window — how much text it can hold in front of its eyes at once. It's measured in tokens, but the point is simple: the space is limited.
When a chat grows, old messages stop fitting — so they get dropped to make room for new ones. Hence the classic: at the start you asked for short answers, and an hour later it's writing walls of text again. It's not being difficult: your request simply fell out of the window. That's this chat's memory — short, alive only until the window overflows.
Why it remembers you in a new chat
That's a second, different mechanism — persistent memory. You start a brand-new dialogue and it says: "hi, you're the one building a study app." How?
The app (ChatGPT, Claude and others) has kept a small separate notes file about you: "named so-and-so, learning vibe coding, likes short answers." And it quietly slips those notes into the start of every new conversation — essentially into the system prompt. The model re-reads them and acts like it "remembers." In truth, it's the file that remembers, not the model.
Two memories — and how to use each
Put the picture together. The assistant has, in effect, two memories:
- Chat memory (the context window) — everything in the current dialogue. Large but temporary: overflow it and the start is forgotten.
- Persistent memory (the notes file) — a small summary about you. Lives across chats, but holds very little.
What to do with this in practice:
- Repeat what matters near the end of a long chat — don't rely on something said an hour ago.
- Don't drag one dialogue on forever. Bloated and "dumber"? Start a new one and briefly restate the task.
- Want it remembered long-term? Ask directly: "remember that…" It lands in the notes file, not the fragile chat window.
How this differs from the model's "real" knowledge
There's a third layer — don't mix it up. What the model learned during training is its knowledge cutoff: general facts about the world, baked in once and for good. That's not about you and doesn't change mid-conversation. Memory is what gets layered on top here and now: your chat and your notes. Knowledge is baked in; memory is slipped in.
And when an agent pulls relevant chunks from a big store and drops them into context, that's a separate technique called RAG. Also, at heart, "put the right text in front of the model before it answers."
Q: What happens if I wipe the memory?
The assistant forgets everything personal about you and starts fresh — like a new acquaintance. General knowledge (language, facts) stays: that's training, and memory doesn't touch it.
Q: Does the model see my past chats by default?
Usually no — each chat is its own island. It "recalls" the past only if the app deliberately saved a note and slipped it in. That's why memory can often be turned on, off, and cleared in settings.
Q: Why doesn't a huge context window solve everything?
Even a giant window isn't infinite, and it costs more: the more text you slip in, the more tokens and money per answer. Plus, in a very long folder the model more easily loses the important bits. So small notes plus a fresh chat often beat one endless dialogue.
Short story-lessons, an agent simulator and daily practice — in our mobile app. Free.





