🚀 Phases 1–5 are live — Days 1–17 cover the foundations and the algorithmic patterns. See the roadmap →

Introduction to Backtracking

Backtracking is the algorithm pattern you use when there’s no clever recurrence, no greedy swap, no DP table — when the only way forward is to actually try the possibilities, but in a way that doesn’t take forever.

Concretely: you grow a partial solution one step at a time. At each step you have a small set of legal choices. You pick one, recurse, and either reach a complete solution (success!) or hit a dead end. When you hit a dead end, you undo the last choice and try the next legal one. That undo step — the “back” in backtracking — is the whole game.

The decision tree

Every backtracking problem implicitly defines a tree of partial solutions. The root is the empty solution. Each child is one more choice on top of its parent. The leaves are either complete solutions or dead ends.

Take generating all subsets of [1, 2, 3]. At every element, we choose to include it or skip it:

Subset decision tree for [1, 2, 3] — '+k' = include, '-k' = skip
[][]+1[]+1+2[]+1+2+3[]+1+2-3[]+1-2[]+1-2+3[]+1-2-3[]-1[]-1+2[]-1+2+3[]-1+2-3[]-1-2[]-1-2+3[]-1-2-3
recursive call    base case15 total calls

Eight leaves, eight subsets. The “tree” exists only in the call stack — you don’t materialize it, you walk it.

This is the universal shape. Backtracking is depth-first walking of a decision tree, with an undo between siblings.

Brute force vs backtracking — the difference

Both try every possibility. The difference is when they detect that a partial solution is doomed.

  • Brute force builds a complete candidate, then checks if it’s valid. For an 8 × 8 board, that’s 64⁸ ≈ 280 trillion complete configurations to check for N-Queens. Hopeless.
  • Backtracking checks validity as it builds. The moment a partial solution can’t possibly extend to a valid full solution, it bails out and tries something else. For N-Queens, this trims the search from trillions to fewer than 2000 nodes.

The technical name for this trick is pruning, and we’ll spend an entire page on it. But the intuition is: fail fast, fail high in the tree, never explore a subtree you can prove is dead.

Without pruning, backtracking is just slow brute force with a fancy name. With pruning, it’s one of the most powerful general-purpose search techniques in your toolbox — solving puzzles in microseconds that brute force can’t finish in a millennium.

When to reach for backtracking

The tell-tale signs:

  • “Find all …” — all subsets, all permutations, all valid combinations, all paths. The output is itself a list of solutions, so you have to enumerate them.
  • “Does there exist …” — does there exist a valid Sudoku completion? A way to color the graph? A path through the maze? The state space is too irregular for DP, but the partial-state check is cheap.
  • “Smallest / largest valid configuration …” — N-Queens with the minimum number of pieces, the shortest word ladder, the smallest Sudoku puzzle with a unique solution.
  • Constraints are local — “no two queens on the same diagonal,” “no repeated character,” “every box of nine cells contains 1–9.” Local constraints make pruning cheap and effective.
  • n is small — usually n <= 20 for unrestricted problems, sometimes n <= 100 for problems with heavy pruning. If n is large, the only DP, greedy, or polynomial-time algorithm will work.

If two or three of these match, reach for the template.

Backtracking vs DP vs greedy — pick the right hammer

A common point of confusion. Here’s the disambiguation:

TechniqueWhen subproblems repeatWhen optimal substructureWhen you need ALL solutions
DPYes — caching pays offYes — combine subanswersAwkward — DP returns ONE answer or a count
GreedyDoesn’t matter — never reconsidersYes, plus exchange argumentNo — commits early
BacktrackingNo — every branch is distinctNot required — just feasibilityYes — its natural output

DP and backtracking are not enemies. Memoized backtracking is just DP written top-down with extra branching — when subproblems repeat and you want one optimum, lean toward DP. When they don’t repeat or you need every valid solution, lean toward backtracking.

A worked example end-to-end: generate all subsets

Given a set of distinct integers nums, return all possible subsets (the power set).

Five questions to set up any backtracking problem:

What’s the partial solution?

A list of the elements picked so far — call it path. Starts empty.

What’s a complete solution?

Once we’ve made a decision about every element of nums, path is one valid subset. We record a copy of it.

What are the choices at each step?

At index i, we have two choices: include nums[i] in path, or skip it.

Trivial — there are no constraints on subsets, every choice is legal. Pruning won’t help here.

What’s the un-do?

After recursing with nums[i] included, pop it off path to restore state before trying the skip branch.

The code transcribes the answers:

Time: O(2^n × n) — there are 2^n subsets and copying each one takes O(n). Space: O(n) recursion stack plus O(2^n × n) for the output.

That’s the whole template. The next page formalizes it into the three-line skeleton you can drop into every problem on this page.

The two outputs: “all of them” vs “the best one”

Backtracking problems split into two flavors:

  • Enumerate: return all valid solutions. Code reaches a leaf, records a copy of path, returns up the tree to find more. Subsets, permutations, combinations, generate-parentheses.
  • Search: return one solution (or the optimum), then stop. Code returns true upward through the call stack as soon as a leaf works. N-Queens with a single solution, Sudoku, word-ladder reachability.

The template is the same for both — what changes is what happens at the leaf:

on success:
   enumerate flavor → record path, keep searching
   search flavor    → return true up the stack, stop exploring

Knowing which flavor a problem asks for tells you whether to return early or keep going. Read the problem statement carefully.

Quick check

What's the defining step that distinguishes backtracking from pure brute force?
You see: 'return all valid placements of 8 queens on an 8x8 chessboard such that none attack each other.' Which technique fits?
A subset of n elements has 2^n possibilities; a permutation has n!. Which one explodes faster?

Next: the template — the three-line skeleton, plus the four canonical shapes (subsets, permutations, combinations, partitions) built on top of it.