The DSU Template
The core data structure is a single array.
parent: [0, 1, 2, 3, ..., n-1]Initially parent[i] = i — every element is its own root, in its own singleton tree.
Naive find
find(x) walks parent pointers up to the root:
def find(x):
while parent[x] != x:
x = parent[x]
return xThe root is the element whose parent is itself.
Naive union
union(x, y) finds the roots of x and y. If they’re different, make one root point at the other:
def union(x, y):
rx, ry = find(x), find(y)
if rx == ry:
return False # already in same set
parent[rx] = ry
return True # union happenedReturning True/False is a useful convention — it tells the caller whether anything actually merged, which doubles as a cycle-detection signal.
The naive code, end to end
That’s it. Functionally correct, easy to read, and dangerously slow on a bad input.
The worst case — a degenerate chain
Build a DSU on {0, 1, ..., n-1}, then call:
union(0, 1)
union(1, 2)
union(2, 3)
...
union(n-2, n-1)After each union, the smaller-indexed root gets re-parented to the larger one:
After union(0, 1): 0 → 1
After union(1, 2): 0 → 1 → 2
After union(2, 3): 0 → 1 → 2 → 3
...
After union(n-2, n-1): 0 → 1 → 2 → ... → n-1The tree is a linked list. Now find(0) walks all n parent pointers — O(n).
m operations on a degenerate DSU cost O(m · n). That’s worse than naive renaming with a dict. We need to fix this.
Two complementary fixes
The two optimizations attack the problem from different angles:
- Union by rank (or size) — when uniting two trees, attach the smaller tree under the larger root. Trees stay balanced; max depth grows like
log ninstead ofn. - Path compression — every time
find(x)walks up to the root, re-point every visited node directly to the root. Futurefinds on those nodes areO(1).
Each one alone gives a meaningful speedup. Together they’re devastating — amortized O(α(n)) per operation. That’s the next page.
Forgetting both optimizations is the easiest correctness-but-not-performance mistake to make in an interview. The code “works,” passes the small tests, and times out on the large one. Always write the optimized version from the start.
Common variants of the template
A few small extensions show up repeatedly:
- Component count. Keep an integer
componentsinitialized ton. Decrement it wheneverunionreturnsTrue. At the end,componentsis the number of connected groups. - Component sizes. Maintain a
size[]array alongsideparent[]. When uniting, the new root’s size is the sum of the two old roots’ sizes. Use cases: largest component, query “how big isx’s group?”. - Weighted DSU. Store an offset
weight[x]= “distance fromxto its parent.” Maintained throughfindandunion. Powers parity-check and equality-with-offset problems.
Quick check
Next: the two optimizations and the amortized analysis that makes DSU shockingly fast.