Find Median from Data Stream hard
Description
The median of a sorted list is the middle value if the list has odd length, or the average of the two middle values if even. Design a data structure that supports:
addNum(int num)β add an integer from the stream.double findMedian()β return the median of all numbers seen so far.
Each addNum should be efficient (better than re-sorting), and findMedian should be O(1).
Examples
> Case 1:
add(1); add(2); findMedian() β 1.5
add(3); findMedian() β 2.0
add(4); findMedian() β 2.5
add(5); findMedian() β 3.0Constraints
-10^5 <= num <= 10^5- At most
5 Β· 10^4calls to each method. - Follow-up: What if all numbers are between 0 and 100? What if 99% of them are?
Why naive approaches fail
| Approach | addNum | findMedian |
|---|---|---|
| Unsorted list | O(1) | O(n) |
| Sorted list (insertion) | O(n) β shift | O(1) |
| Balanced BST | O(log n) | O(log n) |
| Two heaps | O(log n) | O(1) |
For high call counts, two heaps win on both axes.
The two-heap pattern
This is the canonical use of the two-heap pattern. Two heaps split the values:
- Max-heap
loβ holds the smaller half. Top = largest of the smaller half. - Min-heap
hiβ holds the larger half. Top = smallest of the larger half.
Two invariants:
- Every value in
loβ€ every value inhi. (Partition is correct.) lo.size()is either equal tohi.size()or one greater. (Convention: max-heap holds the extra.)
When both hold, the median is:
lo.top()iflohas the extra element.(lo.top() + hi.top()) / 2.0if sizes are equal.
The dance
Every insert is three lines:
1. push num onto lo (max-heap)
2. move lo.top() to hi (min-heap) # enforces invariant 1
3. if hi.size() > lo.size(): move hi.top() back to lo # enforces invariant 2Step 1 always pushes; step 2 always rebalances the partition; step 3 only fires if hi grew past lo. The three steps together guarantee both invariants no matter the input order.
Code
The order of the three steps matters. Push to lo first, then migrate to hi. If you push directly to whichever heap looks emptier, you can break invariant 1 β a value smaller than hi.top() ending up in hi. Always push to one fixed heap and let the rebalance dance maintain the partition.
Trace on [1, 2, 3, 4, 5]
add(1): lo=[1], hi=[] β median = 1
add(2): lo=[1], hi=[2] β median = (1+2)/2 = 1.5
add(3): lo=[2,1], hi=[3] β median = 2
add(4): lo=[2,1], hi=[3,4] β median = (2+3)/2 = 2.5
add(5): lo=[3,1,2], hi=[4,5] β median = 3(Heap arrays arenβt sorted; only the tops are positioned. The displayed order is one valid array representation.)
Follow-up: bounded value range (0..100)
If we know values are bounded by a small integer range, we can use a bucket count array of size 101. Maintain the running total count, then on findMedian() walk the cumulative count to find the median position.
addNum: O(1)findMedian: O(range) = O(101) = O(1) in practice
For values bounded by 100, this beats two heaps. For unbounded values, two heaps remain the best general-purpose solution.
Follow-up: 99% of numbers in 0..100
A hybrid: bucket count for the 99% within range, plus a balanced data structure (e.g., another heap or sorted list) for the 1% outliers. On findMedian, account for both. This is the kind of trick interviewers reward when they push you with the βwhat ifβ follow-ups.
Analysis
- Time: O(log n) per addNum, O(1) per findMedian.
- Space: O(n) β both heaps together hold all numbers ever added.
Find Median is the flagship problem for the two-heap pattern. Once you know the dance, it generalizes to sliding-window median (with lazy deletion), order-statistic queries, and any βtrack the middle of a moving rangeβ problem.