Design Typeahead / Autocomplete medium
The prompt
As a user types into a search box, return the top 5 most relevant completions for the current prefix — the dropdown under Google/YouTube search. Fires on every keystroke, so it lives or dies on latency.
Requirements
- Functional: given a prefix, return the top-k (≈5) suggestions, ranked by popularity. Suggestions update as data/trends change.
- Non-functional: brutal latency budget (p99 well under ~100 ms — it must feel instant while typing), massive read volume, eventual freshness is fine (a new trending term can take minutes to appear).
Estimation
If 100 M searches/day and a user types ~20 characters per search → ~2 B keystroke queries/day ≈ ~23k/s average, much higher at peak. This is read-dominated and latency-critical — everything is about a fast, cached prefix lookup.
The core data structure — a trie
A trie (prefix tree, Day 18) maps prefixes to completions naturally: walk down the tree following the typed characters, then return the best descendants. To hit the latency budget, precompute and store the top-k completions at each node so a lookup is “walk to the prefix node, read its cached top-5” — no subtree search at query time.
Data model & serving
- Build offline: aggregate query logs → frequency per term → build the trie with top-k materialized at each node. Rebuild periodically (hourly/daily) from the latest logs.
- Serve from memory: the trie (or a flattened prefix → top-k map) lives in RAM / a cache like Redis. Prefixes are sharded across servers (e.g. by first 1–2 characters).
High-level design
Deep dives
- Latency: debounce on the client (don’t fire on every keystroke — wait ~50–100 ms of idle), serve entirely from RAM, and shard by prefix so each server holds a slice.
- Ranking: top-k by historical frequency, optionally time-decayed so trending terms rise. Personalization and typo-tolerance (fuzzy matching) are good “if we had more time” extensions to mention.
- Freshness vs cost: rebuilding the whole trie often is expensive. Periodic full rebuilds + a smaller “recent/trending” layer merged at query time is a common compromise.
The key insight: typeahead pushes all the expensive work (ranking, aggregation) offline, so the online path is a pure cached lookup. This “precompute the answer, serve from memory” pattern is the same trick behind the URL shortener’s read path and the news feed’s fan-out-on-write — when reads massively outnumber writes, do the work at write/build time.
Analysis
- Query latency: single-digit ms (in-memory prefix lookup).
- Memory:
O(total characters in vocabulary)for the trie +kper node.
Same skin
- Search-as-you-type in any product (IDEs, command palettes, e-commerce search).
- Trie problems from Day 18 — word search, prefix matching, the same structure at single-machine scale.
- Trending / top-k over a stream — the offline ranking layer is a top-k heap problem over query logs.