DSU Optimizations

The chain that just timed out had height n. I’m about to show you two changes — neither more than three lines — that together drag the amortized cost per operation down to a number that is at most 5 for any input that fits in the observable universe. The second change does its work for free, on a walk you were going to make anyway. How does “walk to the root” turn into “and now that whole branch is flat”?

Two ideas. Independently, each speeds DSU up. Together, they produce one of the most elegant amortized-complexity results in computer science.

Optimization 1 — union by rank

The naive union always re-parents the same way (parent[rx] = ry). That’s how we got the degenerate chain.

Union by rank: track the approximate height of each tree, and always attach the shorter tree under the taller root. Heights then grow only when two equal-height trees are joined — and the new height is just one more.

def union(x, y):
    rx, ry = find(x), find(y)
    if rx == ry:
        return False
    if rank[rx] < rank[ry]:
        rx, ry = ry, rx        # swap so rx is the taller (or equal)
    parent[ry] = rx            # attach shorter under taller
    if rank[rx] == rank[ry]:
        rank[rx] += 1          # equal heights merged → height grows by 1
    return True

rank[] is initialized to all zeros.

The bound: with union by rank alone, every tree’s height is at most log₂(n), so find is O(log n). That’s already a huge improvement over O(n).

Union by size (the alternative)

Same idea, but track the size (element count) of each tree instead of an abstract rank, and attach the smaller-size tree under the larger:

def union(x, y):
    rx, ry = find(x), find(y)
    if rx == ry:
        return False
    if size[rx] < size[ry]:
        rx, ry = ry, rx
    parent[ry] = rx
    size[rx] += size[ry]

Same asymptotic bound (O(log n) height), and size[] is sometimes useful for other reasons (querying component size). In interviews, either is fine; rank is slightly tighter in theory, size is more readable.

Optimization 2 — path compression

The other side of the attack: flatten trees during find.

Every time find(x) walks x → p → pp → ... → root, re-point every node on that walk directly to the root. Next time you find(x) (or any node on that walk), it’s O(1).

def find(x):
    if parent[x] != x:
        parent[x] = find(parent[x])   # recursively find root AND re-point
    return parent[x]

The recursive form is the simplest: find(parent[x]) returns the root, and we assign that root as x’s new parent on the way back up. Every node on the walk gets re-parented to the root in a single sweep.

The iterative version is uglier but avoids deep recursion (which can blow the stack for very tall trees in Python/Java):

def find(x):
    # find the root
    root = x
    while parent[root] != root:
        root = parent[root]
    # second pass: re-point everyone on the path
    while parent[x] != root:
        parent[x], x = root, parent[x]
    return root

Both flatten the path. Use whichever you can write cleanly under pressure.

Predict first

Take the degenerate chain 0 → 1 → 2 → ... → 7 from the previous page and call find(0) ONCE with path compression. What does the parent array look like immediately after — and what does find(3) cost now?

Rebuild path compression from memory

The recursive find is four lines, and exactly one of them does the flattening. Recall it: it must both recurse up to find the root and store that root back as x’s parent.

python · fill in the blanks0/1 hints

def find(self, x):
  if self.parent[x] != x:
      # ??? recurse to the root AND re-point x in one line
  return self.parent[x]

Run it yourself — Python

parent = list(range(8))
# build a degenerate chain: 0 -> 1 -> 2 -> ... -> 7
for i in range(7):
  parent[i] = i + 1

def find(x):
  if parent[x] != x:
      parent[x] = find(parent[x])   # path compression
  return parent[x]

print("before  :", parent)
root = find(0)
print("find(0) :", root)
print("after   :", parent)
print("find(3) now reaches the root in one hop:", parent[3] == root)

Both, together

Thirty lines (rough). This is the version to memorize.

The amortized complexity — inverse Ackermann

With both optimizations, the amortized cost per operation is O(α(n)), where α is the inverse Ackermann function. Tarjan proved this in 1975.

Why “amortized”? Individual finds can still walk a path — but path compression shortens the path so much for future operations that, averaged across m operations, the total cost is O(m · α(n)).

The Ackermann function grows so fast it makes 2^n look quaint. Its inverse grows so slowly:

n	α(n)
0	0
1	1
4	2
16	3
65,536	4
2^65,536	5

For any input that fits in the observable universe, α(n) ≤ 5. In practice, treat DSU as O(1) amortized per operation.

The full proof of the α(n) bound is genuinely beautiful and well outside interview scope. The interview-relevant takeaway: with both optimizations, every DSU operation is essentially constant time, even though individual finds can technically walk a path. Anyone arguing about “logarithmic vs constant” for DSU is missing the point — it’s α(n), which is so close to constant that nobody distinguishes in practice.

What if you forget one?

Only path compression, no union by rank. Amortized O(log n) per operation. Still very fast. Often acceptable for interviews.
Only union by rank, no path compression. Worst-case O(log n) per operation. Same asymptotic ceiling, but no amortization needed for the bound.
Neither. O(n) worst case per operation. Don’t.

In an interview, write both. It’s only a few extra lines, and you avoid having to explain why you skipped one.

Quick check

What does path compression do during find(x)?

What's the amortized complexity of DSU with both union by rank AND path compression?

If you use path compression but NOT union by rank, what's the amortized complexity?

These two changes are what separate the textbook-correct DSU from the template that timed out: rank keeps the tree short, compression keeps it flat, and together they retire the O(n) chain for good. Now that find/union are effectively free, we can spend them lavishly. Next: the four canonical applications — connected components, undirected cycle detection, Kruskal’s MST, and weighted DSU.

The DSU Template Applications

Finished this page?