Design a Notification Service hard
The prompt
A central service other systems call to notify users across multiple channels — mobile push, SMS, email, in-app — reliably and at scale. “Your order shipped,” “new login detected,” “someone liked your post.” The challenge isn’t any single send; it’s doing millions of them reliably through flaky third-party providers.
Requirements
- Functional: accept a “notify user U with content C via channels X” request; deliver across push (APNs/FCM), SMS, email; respect user preferences and opt-outs.
- Non-functional: reliable (don’t drop notifications), scalable (millions/day, bursty), no duplicates (don’t send the same alert twice), decoupled (a slow email provider shouldn’t block push).
Estimation
10 M notifications/day → ~115/s average but very bursty (a marketing blast or an incident can spike to tens of thousands/s). Burstiness + unreliable external providers → this is a textbook queue + worker design.
The core pattern: queue-per-channel with workers
Decouple the request from the delivery. The API validates and enqueues instantly; per-channel worker pools drain the queues and call the external providers at their own pace. A spike just makes the queues longer — nothing falls over.
The flow:
- A service calls
POST /notifywith user, content, and channels. - The API validates, checks user preferences/opt-outs, looks up device tokens / phone / email, and enqueues a job per channel.
- Per-channel workers dequeue and call the external provider (APNs, FCM, Twilio, SES), handling provider-specific quirks.
Deep dives
- Reliability & retries: providers fail transiently. Workers retry with exponential backoff; after N failures, route to a dead-letter queue for inspection rather than silently dropping. The queue’s persistence guarantees a notification isn’t lost if a worker crashes mid-send.
- Idempotency (no duplicates): queues give at-least-once delivery, so a job can be processed twice. Attach a dedupe key (notification ID) and track sent IDs (in Redis) so a retry of an already-sent notification is a no-op. This is the most important correctness detail — nobody wants the same SMS twice.
- Rate limiting: both to respect provider quotas (Twilio caps your send rate) and to avoid spamming a user — cap notifications per user per time window. Reuse the rate limiter.
- Preferences & templating: a preference service (opt-outs, quiet hours, per-channel settings) gates sends; a template service renders content per channel/locale.
- Priority: a 2FA code must beat a marketing blast — separate high/low priority queues so transactional notifications jump the line.
At-least-once + idempotency is the combo to say out loud. You want at-least-once delivery (retries) so nothing is lost — but that means duplicates are possible, so consumers must dedupe by an idempotency key. “I’d make the workers idempotent with a dedupe key so retries are safe” is the single sentence that signals you’ve built a real queue-based system, not just drawn one.
Analysis
- Throughput: scales per channel by adding workers; queues absorb bursts.
- Reliability: persistent queues + retries + dead-letter queue → no silent drops.
- Correctness: idempotency keys prevent duplicate sends under at-least-once delivery.
Same skin
- The offline-message path of the chat app is a notification service in miniature.
- Email/SMS marketing platforms, alerting/on-call systems (PagerDuty), order-status pipelines — same queue + worker + retry + idempotency skeleton.
- Message queues and the fan-out idea from the news feed are the load-bearing patterns.