Linear-Time Sorts

The comparison-sort lower bound says no algorithm that compares pairs of elements can sort in less than O(n log n) time. It’s a theorem, not an engineering challenge — you can’t out-clever it.

But you can cheat. If you stop comparing and instead use the values themselves as indices, you escape the lower bound entirely. These are the non-comparison sorts: Counting, Radix, and Bucket Sort. When their assumptions hold, they run in O(n) — flat-out faster than any comparison sort.

The catch is always the same: they only work on certain kinds of input.

1. Counting Sort

Use it when: keys are integers in a small range [0, k].

The idea is dead simple: for each possible value v in the range, count how many times it appears, then write that many copies of v to the output in order.

The algorithm

Count

Walk through the input. count[v] += 1 for each value v.

Prefix-sum the counts

Convert count[v] to “number of elements <= v.” Now count[v] - 1 tells you the rightmost position where v belongs.

Place in reverse

Walk the input right to left (this preserves stability). For each element x, decrement count[x] and write x to output[count[x]].

Input:  [2, 5, 3, 0, 2, 3, 0, 3]    k = 5

Step 1 — Count:
        count[0] = 2
        count[1] = 0
        count[2] = 2
        count[3] = 3
        count[4] = 0
        count[5] = 1

Step 2 — Prefix sum:
        count[0] = 2
        count[1] = 2
        count[2] = 4
        count[3] = 7
        count[4] = 7
        count[5] = 8

Step 3 — Place from the right (stability preserved):
        x = 3 → count[3] = 6, output[6] = 3
        x = 0 → count[0] = 1, output[1] = 0
        x = 3 → count[3] = 5, output[5] = 3
        ... and so on.

Output: [0, 0, 2, 2, 3, 3, 3, 5]

Code

Properties

  • Time: O(n + k). Linear in the input and the value range.
  • Space: O(n + k).
  • Stable: Yes — guaranteed by the right-to-left placement.
  • Comparison-free: Yes.

When it shines (and when it doesn’t)

Perfect for: counts, frequencies, ages, grades, anything with a small bounded range. Sort Colors is basically counting sort with k = 3.

Terrible for: arbitrary 32-bit integers (k = 4 billion). Or floating-point numbers. Or strings. The count array would be huge or undefined.

⚠️

Counting sort is only linear if k = O(n). If k >> n (say, n = 100 integers in [0, 10^9]), counting sort is O(10^9) — much worse than merge sort. The “linear-time” claim is true only when the range is small.

2. Radix Sort

Use it when: keys are integers (or fixed-length strings), and you don’t want the k = O(n) constraint of counting sort.

Radix sort splits each key into its digits (base 10, or base 256 for bytes, or base 2 for bits) and sorts one digit at a time, from least significant to most significant. Each pass is a stable sort (typically counting sort) by that one digit.

The algorithm

Pick a base

Often base 10 for textbook examples, base 256 for byte-by-byte sorting, or base 2 for bits.

Sort by least-significant digit (LSD)

Use a stable sort (counting sort works great — the range is just 0..base-1).

Repeat for each more significant digit

After d passes (where d = number of digits in the largest key), the array is fully sorted.

Input: [170, 45, 75, 90, 802, 24, 2, 66]

After sorting by ones digit:   [170, 90, 802, 2, 24, 45, 75, 66]
After sorting by tens digit:   [802, 2, 24, 45, 66, 170, 75, 90]
After sorting by hundreds:     [2, 24, 45, 66, 75, 90, 170, 802]

Why does this work? Because each pass is stable, the partial ordering from previous passes is preserved within elements that have the same digit on the current pass.

Code (LSD radix sort, base 10)

Properties

  • Time: O(n · d) where d is the number of digits. For 32-bit integers in base 256, d = 4 — effectively O(n).
  • Space: O(n + b) where b is the base.
  • Stable: Yes, as long as the inner sort is stable.
  • Comparison-free: Yes.

When to actually use it

For bulk sorting of fixed-width integers, radix sort is unbeatable. It’s used in:

  • Database index building
  • Genome sequence sorting (4-symbol alphabet → tiny base)
  • Some flash-memory storage controllers
  • Network packet processing (sorting by IP address)

For small arrays or general data, the constant factors of radix sort (multiple passes, byte extraction overhead) make it slower than quicksort.

3. Bucket Sort

Use it when: keys are roughly uniformly distributed over a known range (typically floats in [0, 1)).

The idea: divide the range into n equal-sized buckets, drop each element into the bucket where it belongs, sort each bucket individually (usually with insertion sort), then concatenate.

Input: [0.78, 0.17, 0.39, 0.26, 0.72, 0.94, 0.21, 0.12]   n = 8

Bucket index for x = floor(n * x):
  0.17 → bucket 1
  0.78 → bucket 6
  0.39 → bucket 3
  ...

Buckets:  [_, [0.17, 0.12], [0.21, 0.26], [0.39], _, _, [0.78, 0.72], [_, ...], [0.94]]
                ↓ sort each bucket
Concat:    [0.12, 0.17, 0.21, 0.26, 0.39, 0.72, 0.78, 0.94]

Code

Properties

  • Time: O(n) average if the input is uniformly distributed. O(n²) worst case if all elements land in one bucket.
  • Space: O(n).
  • Stable: Stable if the per-bucket sort is stable.
  • Comparison-free: Within each bucket, yes — the buckets themselves use position as the order.

When to use it

Uniformly distributed floating-point data — sensor readings, normalized scores, probabilities. Also useful as a preprocessing step for nearest-neighbor lookups when you can pre-bin your data.

Summary table

SortTimeSpaceStableWorks on…
CountingO(n + k)O(n + k)Small-range integers
RadixO(n · d)O(n + b)Fixed-width integers / strings
BucketO(n) avgO(n)Uniformly distributed values

All three break the O(n log n) ceiling — but only when their input assumptions are satisfied. When they apply, they’re unbeatable. When they don’t, fall back to quicksort or merge sort.

Now you’ve seen all the sorts. The next page tells you how to pick the right one for a real problem.