Insertion Sort

Here’s a puzzle: insertion sort is O(n²), asymptotically worse than merge sort and quicksort — yet every major standard library (Python’s Timsort, C++‘s introsort, Rust’s PDQ) calls it internally. Why would the pros deliberately use the “slow” sort? Two reasons hide in this page: it gets faster the more sorted the input is, and at small sizes its constant factors crush the fancy sorts.

This is how you sort a hand of playing cards. Pick up cards one at a time and insert each into its correct spot among the cards you’ve already arranged. The “sorted prefix” grows by one with every card you draw.

It’s the same O(n²) family as bubble and selection — but in practice, insertion sort is much faster. So much faster that every standard library in the world keeps it around for small inputs.

The idea in three sentences

Maintain a sorted prefix of the array. Index 0 is trivially a sorted prefix of length 1.
Take the next element (the “key”) and shift the sorted prefix’s elements right one at a time until the key’s position is found.
Drop the key into place.

Watch it run

insertion_sort([5, 1, 4, 2, 8, 3])

Step 1 / 18 — Index 0 is trivially sorted.

comparingswappingsortedpivotcurrent min

Notice how the green sorted prefix grows by one each iteration. Each element migrates leftward only as far as it has to — that’s why insertion sort gets so cheap on nearly-sorted inputs.

The code

Note how the inner loop shifts rather than swaps. Each shift is one assignment, whereas a swap is three. That tiny constant-factor difference adds up across n² iterations.

Recall the shift that opens a gap for the key:

python · fill in the blanks0/1 hints

def insertion_sort(a):
  for i in range(1, len(a)):
      key = a[i]
      j = i - 1
      while j >= 0 and a[j] > key:
          # ??? slide the bigger element one slot right
          j -= 1
      a[j + 1] = key           # drop the key into the gap

Why it shines on nearly-sorted data

Predict first

Insertion sort is O(n²) worst case. What's its running time on an array that's ALREADY sorted — and why?

If the input is already sorted, the inner while loop never executes — every key is already in its correct place. The whole algorithm runs in O(n): one pass, no shifts.

More generally, if every element is at most k positions away from its sorted position, insertion sort runs in O(n·k). For k = 1 that’s linear time. This adaptive behavior is rare among sorts — it’s what makes insertion sort indispensable for nearly-sorted inputs.

Real-world example: sorting newly arriving data into an already-sorted list. Each new arrival shifts at most a few positions. Insertion sort handles this in O(n) per insertion; most other sorts would re-sort from scratch in O(n log n).

Properties

Time: O(n²) average and worst case. O(n) best case (already sorted).
Space: O(1) — in-place.
Stable: Yes. The a[j] > key comparison is strict, so equal elements never overtake each other.
Adaptive: Yes — runtime scales with how out-of-order the input is.

The “production sort” subroutine

Look inside any modern sort:

Timsort (Python, Java for objects): Insertion sort on runs of up to 32–64 elements.
Introsort (C++ std::sort): Insertion sort on slices smaller than ~16 elements.
PDQ sort (Rust): Insertion sort once partitions get small enough.

They all rely on this fact: for small n (say, n < 32), insertion sort beats quicksort and merge sort in real wall-clock time despite being asymptotically slower. The reason is constant factors — insertion sort has no recursion overhead, no extra memory, and excellent cache locality. The big-O notation hides constants, but at small n, constants are everything.

When you should reach for it

Arrays with fewer than ~50 elements.
Arrays you know are nearly sorted.
As the base case of a divide-and-conquer sort (cut the recursion when slices get small).
Online sorting — inserting new elements into an already-sorted collection.

Quick check

Production sorts (Timsort, introsort) switch to insertion sort once a slice is small (n < ~32), even though it's asymptotically O(n²). Why does that beat continuing with quicksort/merge sort there?

That’s the last of the O(n²) sorts — and the one worth keeping. Next: merge sort, the first sort to break the quadratic ceiling with a divide-and-conquer O(n log n) and a rock-solid worst-case guarantee.

Selection Sort Merge Sort

Finished this page?