🎉 All 30 days are live — the full DSA-30 course, from Big-O to System Design. See the roadmap →

Design a Rate Limiter easy

The prompt

Limit how many requests a client can make in a window — e.g. 100 requests per minute per user. Excess requests get rejected with 429 Too Many Requests. Used everywhere: API gateways, login endpoints (brute-force defense), and protecting expensive downstream services.

Requirements

  • Functional: allow or reject a request based on a configurable limit (per user / IP / API key). Return 429 with a Retry-After header when throttled.
  • Non-functional: low latency (it’s on every request’s hot path — must add sub-millisecond overhead), accurate (don’t let bursts through), and distributed (works across many app servers, not just one).

Estimation

If the protected API does 100k QPS, the limiter is also called 100k times/s. Each check is a tiny counter read/write — perfect for an in-memory store like Redis. Memory is trivial: one small counter per active client.

The core algorithms

AlgorithmIdeaTrade-off
Fixed windowcount requests per clock-aligned minutesimple, but allows a 2× burst at the window boundary
Sliding window logstore a timestamp per request, count those in the last 60 sexact, but stores every timestamp → memory-heavy
Sliding window counterweighted blend of current + previous windownear-exact, cheap — the common production choice
Token bucketbucket refills at a fixed rate; each request takes a tokenallows controlled bursts, very popular
Leaky bucketrequests queue and drain at a fixed ratesmooths bursts into a steady output

Token bucket is the most-loved answer. A bucket holds up to N tokens and refills at r tokens/sec. Each request removes one token; if the bucket is empty, reject. It naturally allows short bursts (up to the bucket size) while enforcing the average rate — which matches how real clients behave. Store {tokens, last_refill_time} per client and compute the refill lazily on each request.

The fixed-window boundary trap

Fixed window’s flaw, worth naming: with a 100/min limit, a client can send 100 requests at 00:59 and 100 more at 01:00200 requests in 2 seconds, because each lands in a different counter window. Sliding-window approaches fix this by considering a rolling 60-second view rather than clock-aligned buckets.

Where it lives

The limiter sits at the edge, backed by a shared counter store
requestINCR + checkallowed →ClientsAPI Gatewayrate limiterRediscountersApp Server
Put the limiter at the API gateway / edge so rejected traffic never reaches your app servers. A central Redis holds counters so the limit is shared across the whole fleet — not per-server.

Deep dive — making it distributed

The tricky part: with 50 app servers, a per-server counter lets a client do 50 × limit. The fix is a central store (Redis) holding one counter per client, shared by all servers. To avoid a race (two servers reading the same count simultaneously), make the read-increment-check atomic — a Redis INCR (with EXPIRE) or a small Lua script does this in one round trip.

⚠️

The limiter must fail open or closed — decide which. If Redis is briefly unavailable, do you reject all traffic (fail closed, safe but a total outage) or allow all traffic (fail open, available but unprotected)? For a public API protecting expensive backends, many choose fail-open with a local fallback limiter. Naming this decision shows you’ve thought past the happy path.

Analysis

  • Latency added: one Redis round trip (~0.5 ms in-datacenter), often pipelined or done with a local token-bucket cache for the truly hot path.
  • Memory: O(active clients) — one small record per client.

Same skin

  • Login throttling / brute-force defense — rate limit failed logins per account + per IP.
  • API quota / billing tiers — same counters, longer windows (per day/month).
  • Leaky bucket for traffic shaping — smooth a spiky producer into a steady stream (closely related to a message queue draining at fixed rate).
  • The “sliding window” counting idea is the same shape as the Day 24 sliding-window algorithmic pattern — a rolling count over a time-ordered stream.