🎉 All 30 days are live — the full DSA-30 course, from Big-O to System Design. See the roadmap →
Day 20 - Union FindThe DSU Template

The DSU Template

Here’s DSU code that’s correct — it passes every small test on the first submission, then times out on the large one with not a single character changed. The bug isn’t a typo. It’s the shape of the tree your unions quietly built up. Read the code below, then watch it fall over.

The core data structure is a single array.

parent: [0, 1, 2, 3, ..., n-1]

Initially parent[i] = i — every element is its own root, in its own singleton tree.

Naive find

find(x) walks parent pointers up to the root:

def find(x):
    while parent[x] != x:
        x = parent[x]
    return x

The root is the element whose parent is itself.

Naive union

union(x, y) finds the roots of x and y. If they’re different, make one root point at the other:

def union(x, y):
    rx, ry = find(x), find(y)
    if rx == ry:
        return False            # already in same set
    parent[rx] = ry
    return True                 # union happened

Returning True/False is a useful convention — it tells the caller whether anything actually merged, which doubles as a cycle-detection signal.

The naive code, end to end

That’s it. Functionally correct, easy to read, and dangerously slow on a bad input.

Rebuild union from memory

Two lines carry the entire correctness of union. Recall them: which value must x and y be reduced to before you compare or re-point, and which single pointer actually performs the merge?

python · fill in the blanks0/2 hints
class DSU:
def __init__(self, n):
self.parent = list(range(n))
def find(self, x):
while self.parent[x] != x:
x = self.parent[x]
return x
def union(self, x, y):
# ??? the line that makes union correct
if rx == ry:
return False
# ??? the line that makes union correct
return True

The worst case — a degenerate chain

Build a DSU on {0, 1, ..., n-1}, then call:

union(0, 1)
union(1, 2)
union(2, 3)
...
union(n-2, n-1)
Predict first
After this exact sequence of n-1 unions, what shape is the single resulting tree — and what does one call to find(0) cost?

After each union, the smaller-indexed root gets re-parented to the larger one:

After union(0, 1):  0 → 1
After union(1, 2):  0 → 1 → 2
After union(2, 3):  0 → 1 → 2 → 3
...
After union(n-2, n-1):  0 → 1 → 2 → ... → n-1

The tree is a linked list. Now find(0) walks all n parent pointers — O(n).

m operations on a degenerate DSU cost O(m · n). That’s worse than naive renaming with a dict. We need to fix this.

Two complementary fixes

The two optimizations attack the problem from different angles:

  1. Union by rank (or size) — when uniting two trees, attach the smaller tree under the larger root. Trees stay balanced; max depth grows like log n instead of n.
  2. Path compression — every time find(x) walks up to the root, re-point every visited node directly to the root. Future finds on those nodes are O(1).

Each one alone gives a meaningful speedup. Together they’re devastating — amortized O(α(n)) per operation. That’s the next page.

⚠️

Forgetting both optimizations is the easiest correctness-but-not-performance mistake to make in an interview. The code “works,” passes the small tests, and times out on the large one. Always write the optimized version from the start.

Common variants of the template

A few small extensions show up repeatedly:

  • Component count. Keep an integer components initialized to n. Decrement it whenever union returns True. At the end, components is the number of connected groups.
  • Component sizes. Maintain a size[] array alongside parent[]. When uniting, the new root’s size is the sum of the two old roots’ sizes. Use cases: largest component, query “how big is x’s group?”.
  • Weighted DSU. Store an offset weight[x] = “distance from x to its parent.” Maintained through find and union. Powers parity-check and equality-with-offset problems.

Quick check

With NAIVE find and union (no optimizations), what's the worst-case time for n operations?
In the union() function, why do we always work with the ROOTS of x and y, not x and y themselves?

This page took the partition idea from the introduction and made it concrete code — then showed the same code self-destructing into the O(n) chain you predicted above. Next: the optimizations — union by rank and path compression, the two small changes that flatten that chain and drag the per-operation cost down to effectively constant.

Finished this page?