Building Blocks
Every system you’ll design is assembled from the same handful of Lego bricks. Learn what each one does, when you reach for it, and — most importantly — what it costs you, and step 5 of the framework becomes snapping pieces together with a reason for each.
1. Load balancer
A load balancer sits in front of a pool of identical servers and spreads incoming requests across them. It’s the thing that makes horizontal scaling possible: when one server isn’t enough, you add more behind the LB, and it also routes around any server that dies (health checks).
The routing strategy is the interesting choice:
| Strategy | How it picks | Best when |
|---|---|---|
| Round robin | next server in a cycle | requests cost roughly the same |
| Random | any server, uniformly | stateless, cheap, even over time |
| Least connections | the server with the fewest open requests | request costs vary a lot |
| IP hash / sticky | hash the client → always same server | you need session affinity |
| Consistent hashing | hash key → ring of servers | caching layers; minimizes reshuffling when servers change |
L4 vs L7. A layer-4 load balancer routes on TCP/IP info (IP + port) — fast and dumb. A layer-7 load balancer reads the HTTP request and can route on path, headers, or cookies (e.g. send /api/video/* to the video fleet) — smarter but does more work per request. Most modern setups use L7 (NGINX, Envoy, ALB) for the flexibility.
The load balancer is itself a single point of failure. If you have one LB and it dies, everything dies. Production setups run an active-passive (or active-active) pair with automatic failover, often fronted by DNS. Mention this in deep dives — interviewers love when you notice that the thing protecting you needs protecting too.
2. Cache
A cache is a small, fast store (usually in-memory, like Redis or Memcached) that holds the results of expensive operations so you don’t redo them. It’s the highest-leverage performance tool you have: a RAM hit at ~100 ns beats a DB read at ~10 ms by five orders of magnitude.
Where caches live
Caching strategies (read paths)
- Cache-aside (lazy loading) — app checks cache; on a miss, reads the DB and populates the cache. Most common. Downside: first read of any key is always a miss.
- Read-through — the cache library itself fetches from the DB on a miss. App only ever talks to the cache.
- Write-through — write to cache and DB together. Cache always fresh; writes are slower.
- Write-back (write-behind) — write to cache, flush to DB asynchronously. Fast writes, but you can lose data if the cache dies before flushing.
“There are only two hard things in computer science: cache invalidation and naming things.” When the underlying data changes, the cached copy is now a lie. Your options: TTL (let entries expire after N seconds — simple, but serves stale data until then) or explicit invalidation (delete/update the cache key on every write — correct, but easy to miss a write path). Most systems use both: TTL as a safety net, explicit invalidation for correctness. Discuss this trade-off in any read-heavy design.
Two more caching hazards worth naming in an interview:
- Thundering herd / cache stampede — a hot key expires and thousands of requests miss simultaneously, all slamming the DB at once. Fixes: lock-on-miss (one request refills, others wait), or stagger TTLs.
- Eviction policy — when the cache is full, what gets thrown out? LRU (least recently used) is the default and what the demo above shows; LFU (least frequently used) and TTL-based are alternatives.
3. CDN (Content Delivery Network)
A CDN is a globally distributed fleet of caching servers that store static content (images, video, CSS, JS) close to users. A user in Tokyo hits a Tokyo edge node instead of your Virginia origin — turning a 150 ms cross-Pacific round trip into a ~10 ms local one.
Reach for a CDN whenever you serve static or rarely-changing large files: images, video, downloads, front-end bundles. It also absorbs huge read traffic that would otherwise crush your origin (and blunts some DDoS attacks). The cost: another cache to invalidate, and it’s poor for highly dynamic, per-user content.
4. Database — SQL vs NoSQL
The most over-asked question in system design, and the one candidates most often get wrong by stating a preference instead of reasoning from the access pattern.
| SQL (relational) | NoSQL | |
|---|---|---|
| Examples | PostgreSQL, MySQL | DynamoDB, Cassandra, MongoDB, Redis |
| Data model | tables, rows, fixed schema | key-value, document, wide-column, graph |
| Strength | joins, transactions (ACID), ad-hoc queries | horizontal scale, flexible schema, high write throughput |
| Consistency | strong by default | often eventual (tunable) |
| Scaling | hard — vertical first, then read replicas / sharding | designed to shard horizontally |
| Reach for it when… | relationships matter, you need transactions (money, orders), queries are varied | access is by-key, you need massive scale, schema evolves fast |
The honest answer to “SQL or NoSQL?” is “what’s the access pattern?” Need multi-row transactions and joins (a bank, an e-commerce order)? SQL. Point lookups by key at huge scale with a simple shape (a URL shortener, a session store, a feed cache)? NoSQL. Modern reality: most large systems use both — SQL for the transactional core, NoSQL for the high-scale read paths. Saying “it depends, here’s what it depends on” beats picking a side.
NoSQL is not “SQL but faster and free.” You trade away joins (you denormalize and duplicate data instead), often trade away strong consistency (eventual by default), and you must design the schema around your queries up front — there are no ad-hoc JOINs to save you later. The win is scale; the cost is rigidity and doing more work in the application layer.
5. Message queue
A message queue (Kafka, RabbitMQ, SQS) sits between a producer and a consumer and decouples them in time. The producer drops a message and moves on; the consumer processes it whenever it’s ready.
Reach for a queue when work is slow (video transcoding, sending email), spiky (a flood of uploads), or should survive a crash (the message persists until processed). It buys you: async responses (snappy API), load smoothing (the queue buffers spikes), and decoupling (producer and consumer scale independently). The cost: eventual consistency (the work isn’t done yet), and you must handle at-least-once delivery — the same message can arrive twice, so consumers must be idempotent.
6. Blob / object storage
For large unstructured files — images, video, backups, logs — you don’t put bytes in a database. You use object storage (S3, GCS, Azure Blob): cheap, effectively infinite, durable (“11 nines”), and accessed by key. The pattern: store the file in blob storage, store the URL/metadata in your database, and serve the file through a CDN.
Putting the bricks together
Quick check
Next: Scaling & Trade-offs — vertical vs horizontal, replication, sharding, consistent hashing, and the CAP theorem with an interactive triangle.