Linear-Time Sorts
The comparison-sort lower bound says no algorithm that compares pairs of elements can sort in less than O(n log n) time. It’s a theorem, not an engineering challenge — you can’t out-clever it.
But you can cheat. If you stop comparing and instead use the values themselves as indices, you escape the lower bound entirely. These are the non-comparison sorts: Counting, Radix, and Bucket Sort. When their assumptions hold, they run in O(n) — flat-out faster than any comparison sort.
The catch is always the same: they only work on certain kinds of input.
1. Counting Sort
Use it when: keys are integers in a small range [0, k].
The idea is dead simple: for each possible value v in the range, count how many times it appears, then write that many copies of v to the output in order.
The algorithm
Count
Walk through the input. count[v] += 1 for each value v.
Prefix-sum the counts
Convert count[v] to “number of elements <= v.” Now count[v] - 1 tells you the rightmost position where v belongs.
Place in reverse
Walk the input right to left (this preserves stability). For each element x, decrement count[x] and write x to output[count[x]].
Input: [2, 5, 3, 0, 2, 3, 0, 3] k = 5
Step 1 — Count:
count[0] = 2
count[1] = 0
count[2] = 2
count[3] = 3
count[4] = 0
count[5] = 1
Step 2 — Prefix sum:
count[0] = 2
count[1] = 2
count[2] = 4
count[3] = 7
count[4] = 7
count[5] = 8
Step 3 — Place from the right (stability preserved):
x = 3 → count[3] = 6, output[6] = 3
x = 0 → count[0] = 1, output[1] = 0
x = 3 → count[3] = 5, output[5] = 3
... and so on.
Output: [0, 0, 2, 2, 3, 3, 3, 5]Code
Properties
- Time: O(n + k). Linear in the input and the value range.
- Space: O(n + k).
- Stable: Yes — guaranteed by the right-to-left placement.
- Comparison-free: Yes.
When it shines (and when it doesn’t)
Perfect for: counts, frequencies, ages, grades, anything with a small bounded range. Sort Colors is basically counting sort with k = 3.
Terrible for: arbitrary 32-bit integers (k = 4 billion). Or floating-point numbers. Or strings. The count array would be huge or undefined.
Counting sort is only linear if k = O(n). If k >> n (say, n = 100 integers in [0, 10^9]), counting sort is O(10^9) — much worse than merge sort. The “linear-time” claim is true only when the range is small.
2. Radix Sort
Use it when: keys are integers (or fixed-length strings), and you don’t want the k = O(n) constraint of counting sort.
Radix sort splits each key into its digits (base 10, or base 256 for bytes, or base 2 for bits) and sorts one digit at a time, from least significant to most significant. Each pass is a stable sort (typically counting sort) by that one digit.
The algorithm
Pick a base
Often base 10 for textbook examples, base 256 for byte-by-byte sorting, or base 2 for bits.
Sort by least-significant digit (LSD)
Use a stable sort (counting sort works great — the range is just 0..base-1).
Repeat for each more significant digit
After d passes (where d = number of digits in the largest key), the array is fully sorted.
Input: [170, 45, 75, 90, 802, 24, 2, 66]
After sorting by ones digit: [170, 90, 802, 2, 24, 45, 75, 66]
After sorting by tens digit: [802, 2, 24, 45, 66, 170, 75, 90]
After sorting by hundreds: [2, 24, 45, 66, 75, 90, 170, 802]Why does this work? Because each pass is stable, the partial ordering from previous passes is preserved within elements that have the same digit on the current pass.
Code (LSD radix sort, base 10)
Properties
- Time: O(n · d) where
dis the number of digits. For 32-bit integers in base 256,d = 4— effectively O(n). - Space: O(n + b) where
bis the base. - Stable: Yes, as long as the inner sort is stable.
- Comparison-free: Yes.
When to actually use it
For bulk sorting of fixed-width integers, radix sort is unbeatable. It’s used in:
- Database index building
- Genome sequence sorting (4-symbol alphabet → tiny base)
- Some flash-memory storage controllers
- Network packet processing (sorting by IP address)
For small arrays or general data, the constant factors of radix sort (multiple passes, byte extraction overhead) make it slower than quicksort.
3. Bucket Sort
Use it when: keys are roughly uniformly distributed over a known range (typically floats in [0, 1)).
The idea: divide the range into n equal-sized buckets, drop each element into the bucket where it belongs, sort each bucket individually (usually with insertion sort), then concatenate.
Input: [0.78, 0.17, 0.39, 0.26, 0.72, 0.94, 0.21, 0.12] n = 8
Bucket index for x = floor(n * x):
0.17 → bucket 1
0.78 → bucket 6
0.39 → bucket 3
...
Buckets: [_, [0.17, 0.12], [0.21, 0.26], [0.39], _, _, [0.78, 0.72], [_, ...], [0.94]]
↓ sort each bucket
Concat: [0.12, 0.17, 0.21, 0.26, 0.39, 0.72, 0.78, 0.94]Code
Properties
- Time: O(n) average if the input is uniformly distributed. O(n²) worst case if all elements land in one bucket.
- Space: O(n).
- Stable: Stable if the per-bucket sort is stable.
- Comparison-free: Within each bucket, yes — the buckets themselves use position as the order.
When to use it
Uniformly distributed floating-point data — sensor readings, normalized scores, probabilities. Also useful as a preprocessing step for nearest-neighbor lookups when you can pre-bin your data.
Summary table
| Sort | Time | Space | Stable | Works on… |
|---|---|---|---|---|
| Counting | O(n + k) | O(n + k) | ✓ | Small-range integers |
| Radix | O(n · d) | O(n + b) | ✓ | Fixed-width integers / strings |
| Bucket | O(n) avg | O(n) | ✓ | Uniformly distributed values |
All three break the O(n log n) ceiling — but only when their input assumptions are satisfied. When they apply, they’re unbeatable. When they don’t, fall back to quicksort or merge sort.
Now you’ve seen all the sorts. The next page tells you how to pick the right one for a real problem.