The Two-Heap Pattern
A single heap gives you O(1) access to one extreme. Two heaps glued together give you O(1) access to the middle.
That single observation solves a family of problems that look impossible at first: maintain the running median of a stream, run a sliding-window median, schedule capacity in a min-max optimization, and more.
The setup
Two heaps:
- A max-heap holding the smaller half of the elements. Top = the largest of the small half.
- A min-heap holding the larger half. Top = the smallest of the large half.
Together they partition the data into a “low half” and a “high half” — and the median sits at the boundary.
Max-heap (smaller half) Min-heap (larger half)
TOP TOP
┌─────┐ ┌─────┐
│ 5 │ ◄── median if odd │ 6 │
└─────┘ └─────┘
[..., 3, 1, 4] [6, 7, 8, ...]The two invariants we maintain:
- Every element in the max-heap ≤ every element in the min-heap. (The partition is correct.)
- Size difference is at most 1. Otherwise the median would be off-center.
If we hold both invariants, the median is trivially:
- The max-heap’s top (if it has the extra element).
- The min-heap’s top (if the min-heap has the extra element).
- The average of the two tops (if sizes are equal).
The algorithm
Insert in three steps:
add(x):
1. push x onto one of the heaps (typically max-heap if x is smaller,
or just always max-heap and rebalance after)
2. enforce invariant 1 — if max-heap.top > min-heap.top, pop and swap
3. enforce invariant 2 — if size diff > 1, move top acrossFind median in O(1):
median():
if max-heap.size == min-heap.size: return (max-heap.top + min-heap.top) / 2
elif max-heap.size > min-heap.size: return max-heap.top
else: return min-heap.topWhy it’s O(log n) per insert
Step 1 is a heap push — O(log n). Steps 2 and 3 do at most one heap pop and one heap push each — O(log n) more. Total: O(log n) per insert, O(1) per median query.
Compare to the naive approaches:
| Approach | Insert | Median |
|---|---|---|
| Unsorted array | O(1) | O(n) — pick the median |
| Sorted array | O(n) — shift everything | O(1) |
| Balanced BST | O(log n) | O(log n) |
| Two heaps | O(log n) | O(1) |
For streaming median, two heaps are best on both axes.
A complete implementation
This is the canonical solution to Find Median from Data Stream.
The three-line dance — push lo → push hi from lo → maybe push lo from hi — handles both invariants with no special-casing. Every push goes through max-heap first; the rebalance after always leaves us with lo.size() ∈ {hi.size(), hi.size() + 1}. That’s the convention “max-heap gets the extra element” — flipping it works fine if you prefer.
Variant 1: K-th order statistic from a stream
Same idea, just shifted off-center. To maintain the K-th smallest value of a stream:
- Max-heap of size K holds the K smallest.
- Min-heap of size (n − K) holds the rest.
- Top of the max-heap is the K-th smallest at any moment.
Equivalent to the size-K-max-heap trick from Kth Largest Element, just with explicit storage for the “large half.”
Variant 2: Sliding-window median
When elements leave the window as well as enter, things get harder. You can’t just pop an arbitrary element from a heap.
The lazy-deletion trick: allow the heap to contain stale entries, but skip them on pop. Maintain a hash map of “values currently scheduled for deletion.” When popping, check the top against the delete map and discard until you reach a live entry.
from collections import defaultdict
import heapq
class SlidingMedian:
def __init__(self, k):
self.k = k
self.lo = [] # max-heap (negated)
self.hi = [] # min-heap
self.delayed = defaultdict(int) # value -> count to delete
self.lo_size = self.hi_size = 0
def prune(self, heap):
"""Pop any tops marked for delayed deletion."""
sign = -1 if heap is self.lo else 1
while heap:
num = sign * heap[0]
if self.delayed[num] > 0:
self.delayed[num] -= 1
heapq.heappop(heap)
else:
break
# … add/remove operations follow the same shape with delayed deletionThis pattern shows up in LeetCode 480 — Sliding Window Median, which is harder than the streaming version. The bookkeeping is annoying; the core insight is the same.
Variant 3: IPO / Maximum Capital (two heaps for two priorities)
In LeetCode 502 (IPO), you have two priority queues:
- Min-heap by capital cost holds projects you can’t afford yet.
- Max-heap by profit holds projects you can afford right now.
On each turn, drain affordable projects from min-heap to max-heap, then take the most-profitable one from the max-heap.
This is two heaps each holding the entire dataset, used to switch between two sort orders fast. Same idea as median maintenance — just with different priorities on each side.
Variant 4: Find Right Interval, schedule conflicts
Algorithms that maintain two “frontiers” — events starting vs events ending — often use two heaps. Meeting Rooms II (which we’ll cover in a later chapter) is the classic example: one heap of meeting start times, one of end times, sweep through and count concurrent meetings.
When not to use two heaps
The two-heap pattern is elegant but it has weaknesses:
- You can’t query arbitrary order statistics in O(log n). Only the median (or the K-th, if K is fixed). For the 30th percentile and the 70th percentile simultaneously, you’d need order statistics trees or a B-tree.
- Deletion of arbitrary elements is messy. The lazy-deletion trick works but doubles the code complexity.
- Sorted iteration isn’t supported. For “give me the K-th smallest at any K,” use a Binary Indexed Tree on rank or a sorted container.
When the requirements outgrow heaps, look at:
- Order statistics tree (a BST augmented with subtree sizes).
- Sorted containers — Python’s
sortedcontainers.SortedList, C++‘spolicy-based ordered set, Java’sTreeMap. - Fenwick tree / BIT on value buckets when values fit in a small range.
Summary
One heap = O(1) access to one extreme. Two heaps = O(1) access to the middle.
The pattern’s three uses:
- Median maintenance — max-heap below, min-heap above, median at the seam.
- Two-side priority queues — switch between cheapest and most-profitable, or by-start and by-end.
- Lazy deletion — let stale entries linger until they bubble up, then skip them.
Now apply it on Find Median from Data Stream.