Introduction to System Design

A coding problem has a hidden test suite — there’s a right answer and you either hit it or you don’t. System design has no test suite. It’s the art of arranging components — servers, databases, caches, queues — so that a product works correctly, stays fast, survives failures, and keeps doing all three as traffic grows from a thousand users to a hundred million.

The interview is open-ended on purpose. Real engineering is open-ended. The interviewer wants to see whether you can take a fuzzy goal and impose structure on it — ask the right questions, make defensible choices, and notice the moment a choice creates a new problem.

The one mental model

Every system you’ll design moves data between three places and does work on it:

Compute, storage, and the network — that's the whole game

Compute is cheap and stateless — add more boxes. Storage is stateful and hard to scale — that's where the interesting decisions live. The network is the slow, unreliable thing connecting them, and it's why distributed systems are hard.

Almost every design decision is a bet about one of three resources:

Compute — CPU. Stateless and easy: when you need more, you add identical machines behind a load balancer.
Storage — disk and memory. Stateful and hard. You can’t just “add a box” — the data has to live somewhere specific, and keeping copies in sync is the source of most distributed-systems pain.
Network — the wires between machines. Slow, lossy, and the reason a single-machine program is a thousand times easier to reason about than a distributed one.

Hold onto that. When a design feels overwhelming, ask: which resource am I running out of, and what’s the cheapest way to get more of it?

The vocabulary you must speak fluently

Interviewers listen for these words. Misusing them is a tell that you’ve read blog posts but not shipped systems.

Term	Plain-English meaning	How it’s measured
Latency	How long one request takes	milliseconds, usually reported as p50 / p99
Throughput	How many requests per second the system handles	QPS (queries/sec), RPS, or bandwidth
Availability	Fraction of time the system is up and answering	”nines” — 99.9% ≈ 8.7 h down/year
Reliability	Probability it does the right thing, not just a thing	error rate, durability guarantees
Scalability	Can it keep its latency/throughput as load grows?	how cost grows with traffic
Consistency	Do all readers see the same, latest data?	strong / eventual / causal
Durability	Once written, does data survive crashes?	“11 nines” for S3, replication factor

Latency and throughput are not the same thing, and they trade off. A highway analogy: latency is how long your car takes to drive end-to-end; throughput is how many cars cross per minute. Adding lanes (parallelism) raises throughput without helping any single car’s latency. Raising the speed limit lowers latency. Batching requests often raises throughput while hurting per-request latency. Know which one the question cares about.

Availability, in “nines”

Availability is quoted as a percentage of uptime, and small-looking differences are huge:

Availability	Downtime per year	Downtime per day	Typical tier
99% (“two nines”)	3.65 days	14.4 min	hobby project
99.9% (“three nines”)	8.77 hours	1.44 min	most SaaS
99.99% (“four nines”)	52.6 min	8.6 s	serious infra
99.999% (“five nines”)	5.26 min	0.86 s	telecom / payments

Each extra nine costs roughly 10× more to achieve — redundant everything, multi-region, automatic failover. Part of the interview is knowing not to over-engineer: a photo-sharing app does not need five nines.

Why the network changes everything: the latency ladder

The single most clarifying fact in system design is how wildly different the access times are for cache vs RAM vs SSD vs disk vs network. These numbers explain why we cache, why we keep data in the same datacenter, and why a chatty design that makes 50 sequential network calls feels broken.

Latency numbers every engineer should know (log scale)

Each step right is roughly 10× slower. The takeaway isn't the digits — it's the orders of magnitude between RAM, SSD, disk, and the network.

L1 cache reference

1 ns · ≈ 1 heartbeat of the CPU

Branch mispredict

3 ns

L2 cache reference

4 ns

Mutex lock / unlock

17 ns

Main memory (RAM) reference

100 ns · 100× slower than L1

Compress 1 KB (Zippy)

2 µs

Read 1 MB sequentially from RAM

3 µs

Send 1 KB over 1 Gbps network

10 µs

Read 4 KB randomly from SSD

16 µs

Read 1 MB sequentially from SSD

49 µs

Round trip within same datacenter

500 µs · 0.5 ms

Read 1 MB sequentially from disk (HDD)

825 µs

Disk seek (HDD)

2 ms · 2 ms

Round trip CA ↔ Netherlands

150 ms · 150 ms — speed of light is the limit

Three lessons fall straight out of that ladder:

RAM is ~100× faster than SSD, and SSD is ~20× faster than spinning disk. This is the entire justification for caching — keep hot data in memory and you skip two orders of magnitude of latency.
A cross-continent round trip (~150 ms) is governed by the speed of light — you literally cannot make it faster. The fix is don’t make the trip: put a CDN or replica near the user.
Sequential is far faster than random, on every medium. Reading 1 MB sequentially from SSD beats 250 random 4 KB reads. This is why databases love sequential writes (append-only logs, LSM-trees) and hate random I/O.

⚠️

The chatty-design smell. If your design makes the client (or a server) wait on 30 sequential round trips to render one page, you’ve built something that feels slow no matter how fast each service is: 30 × 1 ms in-datacenter = 30 ms of pure waiting. The fixes — batching, parallel fan-out, caching, denormalization — all show up later in this chapter. Recognizing the smell is step one.

What “design X” is really asking

When an interviewer says “design a system like Instagram,” they are not asking you to rebuild Instagram. They’re asking you to:

Scope it down

Instagram is enormous. You’ll pick 2–3 core features (post a photo, view a feed, follow a user) and explicitly defer the rest (stories, DMs, ads, reels). Scoping is the first thing they grade.

Reason about scale

100 M daily users posting photos is a different machine than 100 users. You’ll estimate the numbers (next page) so your choices are grounded, not hand-waved.

Make and defend choices

SQL or NoSQL? Push the feed or pull it? Cache here or there? Every choice has a cost. Saying “I’ll use NoSQL” is worthless; saying “I’ll use NoSQL because the access pattern is key-by-user-id with no joins, and I’d rather scale writes horizontally than maintain referential integrity” is the whole game.

Find the bottleneck and address it

A good design names its own weak point before the interviewer does: “The feed-fan-out write will be the bottleneck for celebrity accounts — here’s how I’d handle that with a hybrid push/pull model.”

Functional vs non-functional requirements

Lock this distinction in — it structures the entire first phase of the interview.

Functional requirements — what the system does. “A user can shorten a URL.” “A user can view their feed.” These become your API.
Non-functional requirements — how well it does it. “99.99% available.” “p99 latency under 200 ms.” “Scales to 10 M DAU.” These drive your architecture.

Beginners obsess over functional requirements and draw a correct-but-naive design. Strong candidates spend most of their energy on the non-functional ones — because that’s where the hard, interesting, gradeable engineering lives.

Quick check

A design makes 40 sequential calls to services within the same datacenter to render one page. Each call is fast (~0.5 ms round trip). Why might the page still feel slow?

Why is storage harder to scale than compute?

Your interviewer asks you to design a personal note-taking app for ~500 users. What availability target is appropriate?

Next: The Framework — the 6-step script you’ll run on every question, plus back-of-the-envelope estimation with a live calculator.

Overview The Framework

Finished this page?