Design Typeahead / Autocomplete

Design Typeahead / Autocomplete medium

The prompt

As a user types into a search box, return the top 5 most relevant completions for the current prefix — the dropdown under Google/YouTube search. Fires on every keystroke, so it lives or dies on latency.

Requirements

Functional: given a prefix, return the top-k (≈5) suggestions, ranked by popularity. Suggestions update as data/trends change.
Non-functional: brutal latency budget (p99 well under ~100 ms — it must feel instant while typing), massive read volume, eventual freshness is fine (a new trending term can take minutes to appear).

Estimation

If 100 M searches/day and a user types ~20 characters per search → ~2 B keystroke queries/day ≈ ~23k/s average, much higher at peak. This is read-dominated and latency-critical — everything is about a fast, cached prefix lookup.

The core data structure — a trie

A trie (prefix tree, Day 18) maps prefixes to completions naturally: walk down the tree following the typed characters, then return the best descendants. To hit the latency budget, precompute and store the top-k completions at each node so a lookup is “walk to the prefix node, read its cached top-5” — no subtree search at query time.

Trie with top-k cached at each node — query = walk + read

Each node stores its precomputed top-k suggestions. Typing 'ca' → jump to that node → instantly return its cached list. The expensive ranking work happens offline, not on the keystroke.

Data model & serving

Build offline: aggregate query logs → frequency per term → build the trie with top-k materialized at each node. Rebuild periodically (hourly/daily) from the latest logs.
Serve from memory: the trie (or a flattened prefix → top-k map) lives in RAM / a cache like Redis. Prefixes are sharded across servers (e.g. by first 1–2 characters).

High-level design

Hot read path served from memory; ranking built offline

The keystroke path only ever reads precomputed top-k from memory. A separate offline pipeline mines query logs, rebuilds the ranked trie, and publishes it — keeping the hot path trivial.

Deep dives

Latency: debounce on the client (don’t fire on every keystroke — wait ~50–100 ms of idle), serve entirely from RAM, and shard by prefix so each server holds a slice.
Ranking: top-k by historical frequency, optionally time-decayed so trending terms rise. Personalization and typo-tolerance (fuzzy matching) are good “if we had more time” extensions to mention.
Freshness vs cost: rebuilding the whole trie often is expensive. Periodic full rebuilds + a smaller “recent/trending” layer merged at query time is a common compromise.

The key insight: typeahead pushes all the expensive work (ranking, aggregation) offline, so the online path is a pure cached lookup. This “precompute the answer, serve from memory” pattern is the same trick behind the URL shortener’s read path and the news feed’s fan-out-on-write — when reads massively outnumber writes, do the work at write/build time.

Analysis

Query latency: single-digit ms (in-memory prefix lookup).
Memory: O(total characters in vocabulary) for the trie + k per node.

Same skin

Search-as-you-type in any product (IDEs, command palettes, e-commerce search).
Trie problems from Day 18 — word search, prefix matching, the same structure at single-machine scale.
Trending / top-k over a stream — the offline ranking layer is a top-k heap problem over query logs.

URL Shortener (TinyURL)News Feed

Finished this page?