Top K Frequent Elements medium

Description

Given an integer array nums and an integer k, return the k most frequent elements. You may return them in any order.

Examples

> Case 1:
    Input:  nums = [1, 1, 1, 2, 2, 3],   k = 2
    Output: [1, 2]
    # 1 appears 3 times, 2 appears 2 times — the top 2 most frequent.
 
> Case 2:
    Input:  nums = [1],   k = 1
    Output: [1]

Constraints

  • 1 <= nums.length <= 10^5
  • k is in the valid range [1, number of distinct elements]
  • Aim for better than O(n log n) time.

Step 1 (every approach): count frequencies

Use a hash map. This is O(n) and gives us a frequency dictionary freq mapping each value to its count.

from collections import Counter
freq = Counter(nums)

Now we need the top-k by value of freq. Three reasonable strategies, in order of cleverness.

Approach 1: Sort the items by frequency

def top_k_frequent(nums, k):
    freq = Counter(nums)
    return [v for v, _ in sorted(freq.items(), key=lambda kv: -kv[1])[:k]]

Time: O(d log d) where d is the number of distinct values. Simple, correct, and slow.

Approach 2: Min-heap of size k

Same trick as Kth Largest. Push items onto a min-heap keyed by frequency; pop when the heap grows past k. After processing every distinct value, the heap holds the top-k.

Time: O(n + d log k). Space: O(d + k).

Standard interview answer. Clean, generalizes to streaming inputs (you can’t fit them all in memory but you can keep a size-k heap).

Approach 3: Bucket sort (the O(n) trick)

Here’s the killer observation: the frequency of any value is between 1 and n. So if we make an array buckets[] of size n + 1 where buckets[f] holds the list of values that appeared f times, we can:

  1. Count frequencies → O(n).
  2. Fill the buckets → O(d).
  3. Walk buckets from index n down to 0, collecting values until we have k → O(n).

Total: O(n). No heap, no sort.

This is essentially counting sort of values by their frequency. The “small range of integers” assumption that makes counting sort work is hiding in plain sight: frequencies are bounded by n.

The bucket-sort solution is the rare “you should have used counting sort here” moment. Once you notice the frequencies are bounded, the O(n) solution falls out.

Analysis comparison

ApproachTimeSpaceWhen to use
Sort frequenciesO(d log d)O(d)When you don’t know any better.
Min-heapO(n + d log k)O(d)Streaming inputs, general workhorse.
Bucket sortO(n)O(n)When you spot the bounded-frequency trick.

In an interview, walk through all three — and mention the bucket-sort O(n) trick even if you implement the heap version. Demonstrating that you see the optimal solution is half the win.