Design a Chat App (WhatsApp / Messenger) medium
The prompt
Two users exchange messages in real time: 1:1 chat, delivery within a second, message history preserved, and the sender sees sent/delivered/read receipts. The defining challenge is server-initiated, real-time delivery — the server must push to the recipient, not wait to be polled.
Requirements
- Functional: send/receive 1:1 messages in real time; persist history; show online presence and delivery/read receipts.
- Non-functional: low latency (sub-second delivery), reliable (no lost messages, even if the recipient is offline), ordered (messages appear in send order within a conversation).
Estimation
500 M DAU, ~40 messages each/day → 20 B messages/day ≈ 230k/s (peak higher). Each message is small (~100s of bytes). The harder number is concurrent connections: hundreds of millions of users hold a persistent connection — that’s the real scaling pressure, not raw QPS.
The core decision: how does the server push to the client?
| Approach | How | Verdict |
|---|---|---|
| Short polling | client asks “anything new?” every few seconds | wasteful, laggy — most polls return nothing |
| Long polling | request hangs open until there’s data or timeout | better, but reconnect churn |
| WebSocket | one persistent, bidirectional TCP connection | the answer — server pushes instantly over an open socket |
WebSockets are the heart of any chat design. Unlike HTTP request/response, a WebSocket is a long-lived, two-way pipe: once open, the server can push a message to the client the instant it arrives, with no polling. The cost is stateful connections — each gateway server holds many open sockets and must remember which user is connected to which server, which is the central scaling problem below.
High-level design
Sending a message:
- User A sends over their WebSocket to Gateway 1 → Message Service.
- Message Service persists it (history + the “delivered later” guarantee).
- It looks up B in the session registry: which gateway holds B’s socket?
- If B is online → route to B’s gateway → push over B’s socket. If B is offline → store as undelivered and fire a push notification; deliver when B reconnects.
Deep dives
- The session registry is the key scaling piece: with sockets spread across thousands of gateways, you need a fast
user_id → gatewaymap (Redis) so a message can find its recipient’s connection. This decouples “who’s connected where” from the message logic. - Ordering: attach a per-conversation sequence number or timestamp so the client can order/dedupe. Within one conversation, route through a consistent partition to preserve order.
- Delivery guarantees & receipts: the persist-then-deliver flow gives at-least-once delivery; the client ACKs receipt (→ “delivered”), and a read event flows back as another small message (→ “read”). Messages must be idempotent (dedupe by message ID) since a client may retry.
- Group chat is the natural extension: fan the message out to every member’s gateway (small groups → push to each; this is the news-feed fan-out pattern again).
Persist before you deliver. If you push to the recipient but crash before saving, an offline or reconnecting user loses the message forever. Writing to durable storage first (then delivering) is what lets you guarantee “no message is ever lost,” which is non-negotiable for a messaging product.
Analysis
- Delivery latency: sub-second when both online (one persist + one socket push).
- Connection load: the dominant cost — millions of concurrent WebSockets across a gateway fleet, tracked in the session registry.
Same skin
- Live notifications, multiplayer game state, collaborative editing, stock tickers — all are “server pushes to many connected clients in real time” → WebSockets + a session registry.
- The offline-queue + push-notification path is exactly the notification service.
- Group fan-out reuses the news feed push model.