Design a URL Shortener (TinyURL) easy
This problem gets the full end-to-end treatment in the URL Shortener case study — including the 7-character key-space math and three approaches to unique-code generation. This page is the condensed, interview-pace version you’d actually narrate in 20 minutes. Read both: the case study for depth, this for the rhythm.
The prompt
Turn a long URL into a short one (https://tiny.co/aZ3kP9q) and redirect on visit.
Requirements
- Functional: shorten a URL; redirect a short URL to the original. (Optional: custom alias, expiry, analytics.)
- Non-functional: very read-heavy (clicks ≫ creates), low-latency redirect (sub-100 ms), highly available (lean AP), short codes unique and non-sequential.
Estimation
100 M new URLs/day, read:write = 100:1:
- Writes ≈ 1.2k/s (peak ~3.5k/s); reads ≈ 116k/s (peak ~350k/s) → optimize the read path.
- ~500 B/record × 100 M/day × 5 yr ≈ ~90 TB → shard.
- base62, 7 chars = 62⁷ ≈ 3.5 trillion codes — plenty.
API
Data model
Only access pattern is get(short_code) -> long_url: no joins, no scans. → key-value store, sharded on short_code.
urls: short_code (PK) | long_url | created_at | expires_atHigh-level design
Read-optimized: cache in front, sharded KV behind, KGS for codes
Read path: LB → read service → Redis (hit ≈1 ms) → on miss, sharded KV + repopulate. Write path: write service → Key Generation Service for a unique code → store.
Deep dives
- Unique codes: a Key Generation Service pre-generates random unique 7-char keys offline (no runtime collisions, not guessable). Beats a sequential counter (bottleneck + enumerable) and beats hashing (collisions). See the case study for all three.
- Read scale (350k/s): link popularity follows a power law → 80–90%+ cache hit rate. LRU eviction; read replicas behind the cache; shard on
short_codewith consistent hashing. - 301 vs 302:
301(permanent) is browser-cached → fewer hits but no analytics;302(temporary) routes every click through you → analytics at the cost of traffic.
Analysis
- Redirect latency: ~1 ms on a cache hit, ~10 ms on a miss.
- Storage: ~90 TB over 5 years, spread across shards.
Same skin
- Pastebin — same shape, value is text instead of a URL; store the blob in object storage, metadata in the KV store.
- Distributed unique ID generation (Snowflake) — the KGS idea generalizes to any “give me a globally unique ID without coordination” problem.
- Any read-heavy point-lookup service — image/CDN key resolution, feature-flag lookup, session store.