Linear-Time Sorts

“No sort can beat O(n log n)” — you proved that on the intro page. So how do counting and radix sort run in O(n)? They don’t break the theorem; they sidestep it by never comparing two elements, using the values themselves as array indices. That trick is unbeatable when its one assumption holds — and a disastrous trap the moment it doesn’t. Knowing exactly where that line sits is the whole skill.

The comparison-sort lower bound says no algorithm that compares pairs of elements can sort in less than O(n log n) time. It’s a theorem, not an engineering challenge — you can’t out-clever it.

But you can cheat. If you stop comparing and instead use the values themselves as indices, you escape the lower bound entirely. These are the non-comparison sorts: Counting, Radix, and Bucket Sort. When their assumptions hold, they run in O(n) — flat-out faster than any comparison sort.

The catch is always the same: they only work on certain kinds of input.

1. Counting Sort

Use it when: keys are integers in a small range [0, k].

The idea is dead simple: for each possible value v in the range, count how many times it appears, then write that many copies of v to the output in order.

The algorithm

Count

Walk through the input. count[v] += 1 for each value v.

Prefix-sum the counts

Convert count[v] to “number of elements <= v.” Now count[v] - 1 tells you the rightmost position where v belongs.

Place in reverse

Walk the input right to left (this preserves stability). For each element x, decrement count[x] and write x to output[count[x]].

Input:  [2, 5, 3, 0, 2, 3, 0, 3]    k = 5

Step 1 — Count:
        count[0] = 2
        count[1] = 0
        count[2] = 2
        count[3] = 3
        count[4] = 0
        count[5] = 1

Step 2 — Prefix sum:
        count[0] = 2
        count[1] = 2
        count[2] = 4
        count[3] = 7
        count[4] = 7
        count[5] = 8

Step 3 — Place from the right (stability preserved):
        x = 3 → count[3] = 6, output[6] = 3
        x = 0 → count[0] = 1, output[1] = 0
        x = 3 → count[3] = 5, output[5] = 3
        ... and so on.

Output: [0, 0, 2, 2, 3, 3, 3, 5]

Code

Recall the placement step — the line that writes each element into its final slot (and why it runs right-to-left):

python · fill in the blanks0/1 hints

def counting_sort(a, k):
  count = [0] * (k + 1)
  for x in a:
      count[x] += 1
  for i in range(1, k + 1):
      count[i] += count[i - 1]          # prefix sums become final positions
  out = [0] * len(a)
  for i in range(len(a) - 1, -1, -1):   # right-to-left preserves stability
      count[a[i]] -= 1
      # ??? write x into its computed position
  return out

Properties

Time: O(n + k). Linear in the input and the value range.
Space: O(n + k).
Stable: Yes — guaranteed by the right-to-left placement.
Comparison-free: Yes.

When it shines (and when it doesn’t)

Perfect for: counts, frequencies, ages, grades, anything with a small bounded range. Sort Colors is basically counting sort with k = 3.

Terrible for: arbitrary 32-bit integers (k = 4 billion). Or floating-point numbers. Or strings. The count array would be huge or undefined.

Predict first

Counting sort is O(n + k). You need to sort 100 integers, each somewhere in [0, 1,000,000,000]. Is counting sort a good idea here?

⚠️

Counting sort is only linear if k = O(n). If k >> n (say, n = 100 integers in [0, 10^9]), counting sort is O(10^9) — much worse than merge sort. The “linear-time” claim is true only when the range is small.

2. Radix Sort

Use it when: keys are integers (or fixed-length strings), and you don’t want the k = O(n) constraint of counting sort.

Radix sort splits each key into its digits (base 10, or base 256 for bytes, or base 2 for bits) and sorts one digit at a time, from least significant to most significant. Each pass is a stable sort (typically counting sort) by that one digit.

The algorithm

Pick a base

Often base 10 for textbook examples, base 256 for byte-by-byte sorting, or base 2 for bits.

Sort by least-significant digit (LSD)

Use a stable sort (counting sort works great — the range is just 0..base-1).

Repeat for each more significant digit

After d passes (where d = number of digits in the largest key), the array is fully sorted.

Input: [170, 45, 75, 90, 802, 24, 2, 66]

After sorting by ones digit:   [170, 90, 802, 2, 24, 45, 75, 66]
After sorting by tens digit:   [802, 2, 24, 45, 66, 170, 75, 90]
After sorting by hundreds:     [2, 24, 45, 66, 75, 90, 170, 802]

Why does this work? Because each pass is stable, the partial ordering from previous passes is preserved within elements that have the same digit on the current pass.

Code (LSD radix sort, base 10)

Properties

Time: O(n · d) where d is the number of digits. For 32-bit integers in base 256, d = 4 — effectively O(n).
Space: O(n + b) where b is the base.
Stable: Yes, as long as the inner sort is stable.
Comparison-free: Yes.

When to actually use it

For bulk sorting of fixed-width integers, radix sort is unbeatable. It’s used in:

Database index building
Genome sequence sorting (4-symbol alphabet → tiny base)
Some flash-memory storage controllers
Network packet processing (sorting by IP address)

For small arrays or general data, the constant factors of radix sort (multiple passes, byte extraction overhead) make it slower than quicksort.

3. Bucket Sort

Use it when: keys are roughly uniformly distributed over a known range (typically floats in [0, 1)).

The idea: divide the range into n equal-sized buckets, drop each element into the bucket where it belongs, sort each bucket individually (usually with insertion sort), then concatenate.

Input: [0.78, 0.17, 0.39, 0.26, 0.72, 0.94, 0.21, 0.12]   n = 8

Bucket index for x = floor(n * x):
  0.17 → bucket 1
  0.78 → bucket 6
  0.39 → bucket 3
  ...

Buckets:  [_, [0.17, 0.12], [0.21, 0.26], [0.39], _, _, [0.78, 0.72], [_, ...], [0.94]]
                ↓ sort each bucket
Concat:    [0.12, 0.17, 0.21, 0.26, 0.39, 0.72, 0.78, 0.94]

Code

Properties

Time: O(n) average if the input is uniformly distributed. O(n²) worst case if all elements land in one bucket.
Space: O(n).
Stable: Stable if the per-bucket sort is stable.
Comparison-free: Within each bucket, yes — the buckets themselves use position as the order.

When to use it

Uniformly distributed floating-point data — sensor readings, normalized scores, probabilities. Also useful as a preprocessing step for nearest-neighbor lookups when you can pre-bin your data.

Summary table

Sort	Time	Space	Stable	Works on…
Counting	O(n + k)	O(n + k)	✓	Small-range integers
Radix	O(n · d)	O(n + b)	✓	Fixed-width integers / strings
Bucket	O(n) avg	O(n)	✓	Uniformly distributed values

All three break the O(n log n) ceiling — but only when their input assumptions are satisfied. When they apply, they’re unbeatable. When they don’t, fall back to quicksort or merge sort.

Quick check

Radix sort processes digits least-significant first, using counting sort on each digit. Why must that per-digit sort be STABLE?

You’ve now seen every major sort — three O(n log n) comparison sorts and three O(n) non-comparison sorts. Next: choosing a sort — a decision guide that turns all of this into “given these constraints, use that sort.”

Heap Sort Choosing a Sort

Finished this page?