Building Blocks

Every system you’ll design is assembled from the same handful of Lego bricks. Learn what each one does, when you reach for it, and — most importantly — what it costs you, and step 5 of the framework becomes snapping pieces together with a reason for each.

1. Load balancer

A load balancer sits in front of a pool of identical servers and spreads incoming requests across them. It’s the thing that makes horizontal scaling possible: when one server isn’t enough, you add more behind the LB, and it also routes around any server that dies (health checks).

Load balancer — watch how a strategy spreads 0 requests

0

▣

srv-0

0

▣

srv-1

0

▣

srv-2

0

▣

srv-3

Round robin cycles 0→1→2→3→0… Perfectly even when every request costs the same.

The routing strategy is the interesting choice:

Strategy	How it picks	Best when
Round robin	next server in a cycle	requests cost roughly the same
Random	any server, uniformly	stateless, cheap, even over time
Least connections	the server with the fewest open requests	request costs vary a lot
IP hash / sticky	hash the client → always same server	you need session affinity
Consistent hashing	hash key → ring of servers	caching layers; minimizes reshuffling when servers change

L4 vs L7. A layer-4 load balancer routes on TCP/IP info (IP + port) — fast and dumb. A layer-7 load balancer reads the HTTP request and can route on path, headers, or cookies (e.g. send /api/video/* to the video fleet) — smarter but does more work per request. Most modern setups use L7 (NGINX, Envoy, ALB) for the flexibility.

⚠️

The load balancer is itself a single point of failure. If you have one LB and it dies, everything dies. Production setups run an active-passive (or active-active) pair with automatic failover, often fronted by DNS. Mention this in deep dives — interviewers love when you notice that the thing protecting you needs protecting too.

2. Cache

A cache is a small, fast store (usually in-memory, like Redis or Memcached) that holds the results of expensive operations so you don’t redo them. It’s the highest-leverage performance tool you have: a RAM hit at ~100 ns beats a DB read at ~10 ms by five orders of magnitude.

Cache (LRU, capacity 3) — request a key, watch hits, misses, and eviction

A hit moves the key to the front. A miss on a full cache evicts the least-recently-used (rightmost) entry.

MRU →

empty

empty

empty

← LRU

Where caches live

Caching layers — every hop is a chance to skip the next one

The further left a request is answered, the faster and cheaper it is. The DB is the last resort — every cache layer in front of it exists to keep traffic off it.

Caching strategies (read paths)

Cache-aside (lazy loading) — app checks cache; on a miss, reads the DB and populates the cache. Most common. Downside: first read of any key is always a miss.
Read-through — the cache library itself fetches from the DB on a miss. App only ever talks to the cache.
Write-through — write to cache and DB together. Cache always fresh; writes are slower.
Write-back (write-behind) — write to cache, flush to DB asynchronously. Fast writes, but you can lose data if the cache dies before flushing.

⚠️

“There are only two hard things in computer science: cache invalidation and naming things.” When the underlying data changes, the cached copy is now a lie. Your options: TTL (let entries expire after N seconds — simple, but serves stale data until then) or explicit invalidation (delete/update the cache key on every write — correct, but easy to miss a write path). Most systems use both: TTL as a safety net, explicit invalidation for correctness. Discuss this trade-off in any read-heavy design.

Two more caching hazards worth naming in an interview:

Thundering herd / cache stampede — a hot key expires and thousands of requests miss simultaneously, all slamming the DB at once. Fixes: lock-on-miss (one request refills, others wait), or stagger TTLs.
Eviction policy — when the cache is full, what gets thrown out? LRU (least recently used) is the default and what the demo above shows; LFU (least frequently used) and TTL-based are alternatives.

3. CDN (Content Delivery Network)

A CDN is a globally distributed fleet of caching servers that store static content (images, video, CSS, JS) close to users. A user in Tokyo hits a Tokyo edge node instead of your Virginia origin — turning a 150 ms cross-Pacific round trip into a ~10 ms local one.

Reach for a CDN whenever you serve static or rarely-changing large files: images, video, downloads, front-end bundles. It also absorbs huge read traffic that would otherwise crush your origin (and blunts some DDoS attacks). The cost: another cache to invalidate, and it’s poor for highly dynamic, per-user content.

4. Database — SQL vs NoSQL

The most over-asked question in system design, and the one candidates most often get wrong by stating a preference instead of reasoning from the access pattern.

	SQL (relational)	NoSQL
Examples	PostgreSQL, MySQL	DynamoDB, Cassandra, MongoDB, Redis
Data model	tables, rows, fixed schema	key-value, document, wide-column, graph
Strength	joins, transactions (ACID), ad-hoc queries	horizontal scale, flexible schema, high write throughput
Consistency	strong by default	often eventual (tunable)
Scaling	hard — vertical first, then read replicas / sharding	designed to shard horizontally
Reach for it when…	relationships matter, you need transactions (money, orders), queries are varied	access is by-key, you need massive scale, schema evolves fast

The honest answer to “SQL or NoSQL?” is “what’s the access pattern?” Need multi-row transactions and joins (a bank, an e-commerce order)? SQL. Point lookups by key at huge scale with a simple shape (a URL shortener, a session store, a feed cache)? NoSQL. Modern reality: most large systems use both — SQL for the transactional core, NoSQL for the high-scale read paths. Saying “it depends, here’s what it depends on” beats picking a side.

⚠️

NoSQL is not “SQL but faster and free.” You trade away joins (you denormalize and duplicate data instead), often trade away strong consistency (eventual by default), and you must design the schema around your queries up front — there are no ad-hoc JOINs to save you later. The win is scale; the cost is rigidity and doing more work in the application layer.

5. Message queue

A message queue (Kafka, RabbitMQ, SQS) sits between a producer and a consumer and decouples them in time. The producer drops a message and moves on; the consumer processes it whenever it’s ready.

A queue turns slow, spiky, or unreliable work into async work

The API responds instantly after enqueuing; a pool of workers drains the queue at its own pace. Add workers to drain faster; the queue absorbs traffic spikes so a burst doesn't overload the workers.

Reach for a queue when work is slow (video transcoding, sending email), spiky (a flood of uploads), or should survive a crash (the message persists until processed). It buys you: async responses (snappy API), load smoothing (the queue buffers spikes), and decoupling (producer and consumer scale independently). The cost: eventual consistency (the work isn’t done yet), and you must handle at-least-once delivery — the same message can arrive twice, so consumers must be idempotent.

6. Blob / object storage

For large unstructured files — images, video, backups, logs — you don’t put bytes in a database. You use object storage (S3, GCS, Azure Blob): cheap, effectively infinite, durable (“11 nines”), and accessed by key. The pattern: store the file in blob storage, store the URL/metadata in your database, and serve the file through a CDN.

Putting the bricks together

Every brick, doing the one job it's good at

The CDN serves static files from blob storage. The LB spreads API traffic. App servers hit the cache first, SQL for transactional data, NoSQL for high-scale reads, and offload slow work to a queue drained by workers. No single brick does everything — each one does the job it's best at.

Quick check

Your service does expensive video transcoding when a user uploads a clip. Users complain the upload button 'hangs' for 30 seconds. What's the right building block?

A read-heavy app adds a cache and sees a huge speedup, but users occasionally see stale data after an update. What's the core trade-off they're hitting?

An interviewer asks 'SQL or NoSQL for a banking ledger that must never double-spend?' What's the strong answer?

Next: Scaling & Trade-offs — vertical vs horizontal, replication, sharding, consistent hashing, and the CAP theorem with an interactive triangle.

The Framework Scaling & Trade-offs

Finished this page?