What is a rate limit — why an API answers 429 and asks you to wait

You're sending requests to an API, everything's working — and suddenly the answer is the number 429 and the words "Too Many Requests." The first thought is usually panic: "I broke the key" or "there's a bug in my code."
Here's the surprise: it isn't an error at all. The service didn't refuse you — it's politely asking you to slow down. In a couple of minutes you'll know exactly what to do and stop panicking at the sight of 429.
What a rate limit is
A rate limit caps how many times you can call a service in a given window of time. For example, "60 requests per minute" or "1000 per day." Cross the line and the next request won't run — the answer comes back as a 429.
Why does the service do this? So that one user — or a runaway loop in your code — can't flood it with a million requests and take the server down for everyone else. It's like a turnstile in a subway: it lets the flow through, but strictly one at a time.
Every API lives with this — paid and free alike. There's almost always a limit; it's just higher on a paid plan.
How it works — window, counter, and 429
Picture a counter with a timer. The service opens a "window" — say, one minute. Each of your requests bumps the counter. Hit the limit and every new request gets a 429 until the window ends. The window resets, the counter clears, and you're good again.
Often the server tells you exactly how long to wait. The response may carry a Retry-After: 30 header — "come back in 30 seconds." Or X-RateLimit-Remaining: 0 — "you're out of requests for this window." These aren't decoration; they're meant to be read.
Don't mix up the codes. A 429 isn't a 401 or 403 (those are key or permission problems) and isn't a 500 (the server itself crashed — here's what a 500 error means if you need it). A 429 says exactly one thing: "you're fine, just too often."
Why it matters to you
When you build something with AI, you call other people's APIs constantly: the model itself, weather, a map, a database. Almost each one has its own limit — especially strict on the free tier. Sooner or later you'll hit it.
Two habits that save you:
- Don't retry instantly. Got a 429? Wait and try again, ideally with a growing pause: 1 second, then 2, then 4. It's called "exponential backoff," and nearly every official library does it for you.
- Don't ask for the same thing twice. If the data doesn't change every second, save the answer (cache it) and reuse it. Fewer requests means you hit the limit less.
The takeaway that stays with you: a 429 isn't a reason to panic and isn't a bug. It's a "ease off" signal. The cure is a pause, a cache, or a paid plan — not a frantic rewrite of your code.
Where you run into it
When the AI model in your app suddenly stops answering at peak load. When the free key to a weather service "runs out" by lunchtime. When your Telegram bot fires messages too fast and some just don't arrive.
Anywhere there's a connection to an API, a rate limit is waiting around the corner. Once you know about it, these situations stop being a mystery.
Did I break something with a 429?
No. A 429 means "too many requests in a short time." Your code is probably fine — it's just sending requests faster than the service allows. Add a pause between requests and it'll move along.
How do I bypass a rate limit?
There's no honest "bypass," and you shouldn't try — keys get banned for it. There are three right moves: wait (per the Retry-After header), cache repeated answers, and, if you genuinely hit the ceiling, move to a paid plan with a higher limit.
Short story-lessons, an agent simulator and daily practice — in our mobile app. Free.





