DSU Optimizations
The chain that just timed out had height
n. I’m about to show you two changes — neither more than three lines — that together drag the amortized cost per operation down to a number that is at most 5 for any input that fits in the observable universe. The second change does its work for free, on a walk you were going to make anyway. How does “walk to the root” turn into “and now that whole branch is flat”?
Two ideas. Independently, each speeds DSU up. Together, they produce one of the most elegant amortized-complexity results in computer science.
Optimization 1 — union by rank
The naive union always re-parents the same way (parent[rx] = ry). That’s how we got the degenerate chain.
Union by rank: track the approximate height of each tree, and always attach the shorter tree under the taller root. Heights then grow only when two equal-height trees are joined — and the new height is just one more.
def union(x, y):
rx, ry = find(x), find(y)
if rx == ry:
return False
if rank[rx] < rank[ry]:
rx, ry = ry, rx # swap so rx is the taller (or equal)
parent[ry] = rx # attach shorter under taller
if rank[rx] == rank[ry]:
rank[rx] += 1 # equal heights merged → height grows by 1
return Truerank[] is initialized to all zeros.
The bound: with union by rank alone, every tree’s height is at most log₂(n), so find is O(log n). That’s already a huge improvement over O(n).
Union by size (the alternative)
Same idea, but track the size (element count) of each tree instead of an abstract rank, and attach the smaller-size tree under the larger:
def union(x, y):
rx, ry = find(x), find(y)
if rx == ry:
return False
if size[rx] < size[ry]:
rx, ry = ry, rx
parent[ry] = rx
size[rx] += size[ry]Same asymptotic bound (O(log n) height), and size[] is sometimes useful for other reasons (querying component size). In interviews, either is fine; rank is slightly tighter in theory, size is more readable.
Optimization 2 — path compression
The other side of the attack: flatten trees during find.
Every time find(x) walks x → p → pp → ... → root, re-point every node on that walk directly to the root. Next time you find(x) (or any node on that walk), it’s O(1).
def find(x):
if parent[x] != x:
parent[x] = find(parent[x]) # recursively find root AND re-point
return parent[x]The recursive form is the simplest: find(parent[x]) returns the root, and we assign that root as x’s new parent on the way back up. Every node on the walk gets re-parented to the root in a single sweep.
The iterative version is uglier but avoids deep recursion (which can blow the stack for very tall trees in Python/Java):
def find(x):
# find the root
root = x
while parent[root] != root:
root = parent[root]
# second pass: re-point everyone on the path
while parent[x] != root:
parent[x], x = root, parent[x]
return rootBoth flatten the path. Use whichever you can write cleanly under pressure.
Rebuild path compression from memory
The recursive find is four lines, and exactly one of them does the flattening. Recall it: it must both recurse up to find the root and store that root back as x’s parent.
Both, together
Thirty lines (rough). This is the version to memorize.
The amortized complexity — inverse Ackermann
With both optimizations, the amortized cost per operation is O(α(n)), where α is the inverse Ackermann function. Tarjan proved this in 1975.
Why “amortized”? Individual finds can still walk a path — but path compression shortens the path so much for future operations that, averaged across m operations, the total cost is O(m · α(n)).
The Ackermann function grows so fast it makes 2^n look quaint. Its inverse grows so slowly:
| n | α(n) |
|---|---|
| 0 | 0 |
| 1 | 1 |
| 4 | 2 |
| 16 | 3 |
| 65,536 | 4 |
| 2^65,536 | 5 |
For any input that fits in the observable universe, α(n) ≤ 5. In practice, treat DSU as O(1) amortized per operation.
The full proof of the α(n) bound is genuinely beautiful and well outside interview scope. The interview-relevant takeaway: with both optimizations, every DSU operation is essentially constant time, even though individual finds can technically walk a path. Anyone arguing about “logarithmic vs constant” for DSU is missing the point — it’s α(n), which is so close to constant that nobody distinguishes in practice.
What if you forget one?
- Only path compression, no union by rank. Amortized
O(log n)per operation. Still very fast. Often acceptable for interviews. - Only union by rank, no path compression. Worst-case
O(log n)per operation. Same asymptotic ceiling, but no amortization needed for the bound. - Neither.
O(n)worst case per operation. Don’t.
In an interview, write both. It’s only a few extra lines, and you avoid having to explain why you skipped one.
Quick check
These two changes are what separate the textbook-correct DSU from the template that timed out: rank keeps the tree short, compression keeps it flat, and together they retire the O(n) chain for good. Now that find/union are effectively free, we can spend them lavishly. Next: the four canonical applications — connected components, undirected cycle detection, Kruskal’s MST, and weighted DSU.