🚀 Phases 1–4 are live — Days 1–13 + Day 17 are fully written. See the roadmap →

The Two-Heap Pattern

A single heap gives you O(1) access to one extreme. Two heaps glued together give you O(1) access to the middle.

That single observation solves a family of problems that look impossible at first: maintain the running median of a stream, run a sliding-window median, schedule capacity in a min-max optimization, and more.

The setup

Two heaps:

  • A max-heap holding the smaller half of the elements. Top = the largest of the small half.
  • A min-heap holding the larger half. Top = the smallest of the large half.

Together they partition the data into a “low half” and a “high half” — and the median sits at the boundary.

            Max-heap (smaller half)     Min-heap (larger half)
                  TOP                        TOP
                ┌─────┐                    ┌─────┐
                │  5  │ ◄── median if odd  │  6  │
                └─────┘                    └─────┘
            [..., 3, 1, 4]                [6, 7, 8, ...]

The two invariants we maintain:

  1. Every element in the max-heap ≤ every element in the min-heap. (The partition is correct.)
  2. Size difference is at most 1. Otherwise the median would be off-center.

If we hold both invariants, the median is trivially:

  • The max-heap’s top (if it has the extra element).
  • The min-heap’s top (if the min-heap has the extra element).
  • The average of the two tops (if sizes are equal).

The algorithm

Insert in three steps:

add(x):
    1. push x onto one of the heaps (typically max-heap if x is smaller,
       or just always max-heap and rebalance after)
    2. enforce invariant 1 — if max-heap.top > min-heap.top, pop and swap
    3. enforce invariant 2 — if size diff > 1, move top across

Find median in O(1):

median():
    if max-heap.size == min-heap.size:  return (max-heap.top + min-heap.top) / 2
    elif max-heap.size > min-heap.size: return max-heap.top
    else:                                return min-heap.top

Why it’s O(log n) per insert

Step 1 is a heap push — O(log n). Steps 2 and 3 do at most one heap pop and one heap push each — O(log n) more. Total: O(log n) per insert, O(1) per median query.

Compare to the naive approaches:

ApproachInsertMedian
Unsorted arrayO(1)O(n) — pick the median
Sorted arrayO(n) — shift everythingO(1)
Balanced BSTO(log n)O(log n)
Two heapsO(log n)O(1)

For streaming median, two heaps are best on both axes.

A complete implementation

This is the canonical solution to Find Median from Data Stream.

The three-line dancepush lo → push hi from lo → maybe push lo from hi — handles both invariants with no special-casing. Every push goes through max-heap first; the rebalance after always leaves us with lo.size() ∈ {hi.size(), hi.size() + 1}. That’s the convention “max-heap gets the extra element” — flipping it works fine if you prefer.

Variant 1: K-th order statistic from a stream

Same idea, just shifted off-center. To maintain the K-th smallest value of a stream:

  • Max-heap of size K holds the K smallest.
  • Min-heap of size (n − K) holds the rest.
  • Top of the max-heap is the K-th smallest at any moment.

Equivalent to the size-K-max-heap trick from Kth Largest Element, just with explicit storage for the “large half.”

Variant 2: Sliding-window median

When elements leave the window as well as enter, things get harder. You can’t just pop an arbitrary element from a heap.

The lazy-deletion trick: allow the heap to contain stale entries, but skip them on pop. Maintain a hash map of “values currently scheduled for deletion.” When popping, check the top against the delete map and discard until you reach a live entry.

from collections import defaultdict
import heapq
 
class SlidingMedian:
    def __init__(self, k):
        self.k = k
        self.lo = []                                    # max-heap (negated)
        self.hi = []                                    # min-heap
        self.delayed = defaultdict(int)                 # value -> count to delete
        self.lo_size = self.hi_size = 0
 
    def prune(self, heap):
        """Pop any tops marked for delayed deletion."""
        sign = -1 if heap is self.lo else 1
        while heap:
            num = sign * heap[0]
            if self.delayed[num] > 0:
                self.delayed[num] -= 1
                heapq.heappop(heap)
            else:
                break
    # … add/remove operations follow the same shape with delayed deletion

This pattern shows up in LeetCode 480 — Sliding Window Median, which is harder than the streaming version. The bookkeeping is annoying; the core insight is the same.

Variant 3: IPO / Maximum Capital (two heaps for two priorities)

In LeetCode 502 (IPO), you have two priority queues:

  • Min-heap by capital cost holds projects you can’t afford yet.
  • Max-heap by profit holds projects you can afford right now.

On each turn, drain affordable projects from min-heap to max-heap, then take the most-profitable one from the max-heap.

This is two heaps each holding the entire dataset, used to switch between two sort orders fast. Same idea as median maintenance — just with different priorities on each side.

Variant 4: Find Right Interval, schedule conflicts

Algorithms that maintain two “frontiers” — events starting vs events ending — often use two heaps. Meeting Rooms II (which we’ll cover in a later chapter) is the classic example: one heap of meeting start times, one of end times, sweep through and count concurrent meetings.

When not to use two heaps

The two-heap pattern is elegant but it has weaknesses:

  • You can’t query arbitrary order statistics in O(log n). Only the median (or the K-th, if K is fixed). For the 30th percentile and the 70th percentile simultaneously, you’d need order statistics trees or a B-tree.
  • Deletion of arbitrary elements is messy. The lazy-deletion trick works but doubles the code complexity.
  • Sorted iteration isn’t supported. For “give me the K-th smallest at any K,” use a Binary Indexed Tree on rank or a sorted container.

When the requirements outgrow heaps, look at:

  • Order statistics tree (a BST augmented with subtree sizes).
  • Sorted containers — Python’s sortedcontainers.SortedList, C++‘s policy-based ordered set, Java’s TreeMap.
  • Fenwick tree / BIT on value buckets when values fit in a small range.

Summary

One heap = O(1) access to one extreme. Two heaps = O(1) access to the middle.

The pattern’s three uses:

  1. Median maintenance — max-heap below, min-heap above, median at the seam.
  2. Two-side priority queues — switch between cheapest and most-profitable, or by-start and by-end.
  3. Lazy deletion — let stale entries linger until they bubble up, then skip them.

Now apply it on Find Median from Data Stream.