🚀 Phases 1–5 are live — Days 1–17 cover the foundations and the algorithmic patterns. See the roadmap →
Day 20 - Union FindOptimizations

DSU Optimizations

Two ideas. Independently, each speeds DSU up. Together, they produce one of the most elegant amortized-complexity results in computer science.

Optimization 1 — union by rank

The naive union always re-parents the same way (parent[rx] = ry). That’s how we got the degenerate chain.

Union by rank: track the approximate height of each tree, and always attach the shorter tree under the taller root. Heights then grow only when two equal-height trees are joined — and the new height is just one more.

def union(x, y):
    rx, ry = find(x), find(y)
    if rx == ry:
        return False
    if rank[rx] < rank[ry]:
        rx, ry = ry, rx        # swap so rx is the taller (or equal)
    parent[ry] = rx            # attach shorter under taller
    if rank[rx] == rank[ry]:
        rank[rx] += 1          # equal heights merged → height grows by 1
    return True

rank[] is initialized to all zeros.

The bound: with union by rank alone, every tree’s height is at most log₂(n), so find is O(log n). That’s already a huge improvement over O(n).

Union by size (the alternative)

Same idea, but track the size (element count) of each tree instead of an abstract rank, and attach the smaller-size tree under the larger:

def union(x, y):
    rx, ry = find(x), find(y)
    if rx == ry:
        return False
    if size[rx] < size[ry]:
        rx, ry = ry, rx
    parent[ry] = rx
    size[rx] += size[ry]

Same asymptotic bound (O(log n) height), and size[] is sometimes useful for other reasons (querying component size). In interviews, either is fine; rank is slightly tighter in theory, size is more readable.

Optimization 2 — path compression

The other side of the attack: flatten trees during find.

Every time find(x) walks x → p → pp → ... → root, re-point every node on that walk directly to the root. Next time you find(x) (or any node on that walk), it’s O(1).

def find(x):
    if parent[x] != x:
        parent[x] = find(parent[x])   # recursively find root AND re-point
    return parent[x]

The recursive form is the simplest: find(parent[x]) returns the root, and we assign that root as x’s new parent on the way back up. Every node on the walk gets re-parented to the root in a single sweep.

The iterative version is uglier but avoids deep recursion (which can blow the stack for very tall trees in Python/Java):

def find(x):
    # find the root
    root = x
    while parent[root] != root:
        root = parent[root]
    # second pass: re-point everyone on the path
    while parent[x] != root:
        parent[x], x = root, parent[x]
    return root

Both flatten the path. Use whichever you can write cleanly under pressure.

Both, together

Thirty lines (rough). This is the version to memorize.

The amortized complexity — inverse Ackermann

With both optimizations, the amortized cost per operation is O(α(n)), where α is the inverse Ackermann function. Tarjan proved this in 1975.

Why “amortized”? Individual finds can still walk a path — but path compression shortens the path so much for future operations that, averaged across m operations, the total cost is O(m · α(n)).

The Ackermann function grows so fast it makes 2^n look quaint. Its inverse grows so slowly:

nα(n)
00
11
42
163
65,5364
2^65,5365

For any input that fits in the observable universe, α(n) ≤ 5. In practice, treat DSU as O(1) amortized per operation.

The full proof of the α(n) bound is genuinely beautiful and well outside interview scope. The interview-relevant takeaway: with both optimizations, every DSU operation is essentially constant time, even though individual finds can technically walk a path. Anyone arguing about “logarithmic vs constant” for DSU is missing the point — it’s α(n), which is so close to constant that nobody distinguishes in practice.

What if you forget one?

  • Only path compression, no union by rank. Amortized O(log n) per operation. Still very fast. Often acceptable for interviews.
  • Only union by rank, no path compression. Worst-case O(log n) per operation. Same asymptotic ceiling, but no amortization needed for the bound.
  • Neither. O(n) worst case per operation. Don’t.

In an interview, write both. It’s only a few extra lines, and you avoid having to explain why you skipped one.

Quick check

What does path compression do during find(x)?
What's the amortized complexity of DSU with both union by rank AND path compression?
If you use path compression but NOT union by rank, what's the amortized complexity?

Next: the four canonical applications — Kruskal’s, cycle detection, component counting, and weighted DSU.