πŸš€ Phases 1–5 are live β€” Days 1–17 cover the foundations and the algorithmic patterns. See the roadmap β†’

KMP Algorithm

The Knuth-Morris-Pratt algorithm (1977) solves the pattern-matching problem in O(n + m) time β€” O(m) to preprocess the pattern, O(n) to scan the text β€” with a guarantee that the text pointer never moves backward. Every character is read at most twice.

The key insight: when a mismatch occurs at T[i] vs P[j], we don’t need to reset j to 0. We can reset j to a smaller value β€” the length of the longest proper prefix of P[0..j-1] that is also a suffix. That value is precomputed for every j and stored in an array called the failure function (also called the prefix function, or lps for β€œLongest Proper Prefix which is also Suffix”).

The failure function

For a pattern P of length m, define fail[j] as the length of the longest proper prefix of P[0..j] that is also a suffix of P[0..j].

β€œProper” means strictly shorter than the whole string β€” we don’t count P[0..j] itself.

Example: P = "ABABCABAB"

jP[0..j]Longest proper prefix = suffixfail[j]
0A(none)0
1AB(none)0
2ABA”A”1
3ABAB”AB”2
4ABABC(none)0
5ABABCA”A”1
6ABABCAB”AB”2
7ABABCABA”ABA”3
8ABABCABAB”ABAB”4

The pattern fail = [0, 0, 1, 2, 0, 1, 2, 3, 4].

Building the failure function

The failure function can be built in O(m) using the observation that fail[j+1] can be derived from fail[j]:

def build_fail(P):
    m = len(P)
    fail = [0] * m
    k = 0                         # length of current matching prefix
    for j in range(1, m):
        while k > 0 and P[k] != P[j]:
            k = fail[k - 1]       # fall back
        if P[k] == P[j]:
            k += 1
        fail[j] = k
    return fail

The outer loop runs m - 1 times. The inner while loop decreases k by at least 1 each iteration, and k can only increase by 1 per outer loop step. So the total number of k decreases is bounded by the total increases β€” which is at most m - 1. Total work: O(m).

The failure function construction is itself a KMP search: you’re searching for each prefix of P inside P itself. The recursion fail[k-1] is the key step β€” if P[k] != P[j], we fall back to the longest proper prefix-suffix of the shorter prefix, and try matching P there. This self-referential property is what makes the analysis non-obvious but the code simple.

With fail precomputed, the search is clean:

def kmp_search(T, P):
    n, m = len(T), len(P)
    fail = build_fail(P)
    matches = []
    j = 0                          # characters matched so far
    for i in range(n):
        while j > 0 and T[i] != P[j]:
            j = fail[j - 1]        # fall back without moving i
        if T[i] == P[j]:
            j += 1
        if j == m:                 # full match
            matches.append(i - m + 1)
            j = fail[j - 1]        # continue searching for overlapping matches
    return matches

The text pointer i only moves forward. The pattern pointer j can move backward via fail, but the total backward movement is bounded by the total forward movement (same amortized argument as above). Total comparisons: at most 2n.

Walkthrough: T = "ABABCABABD", P = "ABABD"

Pattern fail = [0, 0, 1, 2, 0].

Pattern Matching: "ABABD" in "ABABCABABD"
Step 1 of 11
Text:
A
B
A
B
C
A
B
A
B
D
Pattern:
A
B
A
B
D
T[0]='A' vs P[0]='A'. Match. j=1.

The critical moment is at step 5: after the mismatch at T[4]='C' vs P[4]='D', KMP doesn’t restart from T[1]. It knows that T[2..3] = "AB" = P[0..1] (because fail[3] = 2), so it resumes matching from P[2] β€” without re-reading any text. The text pointer never moved back.

Full implementation

The amortized argument

The i pointer strictly increases β€” n increments total. The j pointer increases by 1 each time T[i] == P[j] (at most n times) and decreases via fail each mismatch. Since j can’t go below 0 and every decrease came from a prior increase, the total decreases are at most n. Total j movements: at most 2n. Total comparisons: at most 2n. KMP is O(n) for the search phase.

The preprocessing phase runs the same logic on P alone β€” O(m).

Grand total: O(n + m), O(m) space.

The failure function is powerful beyond pattern matching:

1. Does a string have a repeated structure?

A string s of length n is a repetition of a shorter string of length k if and only if k divides n and n % (n - fail[n-1]) == 0.

def smallest_period(s):
    fail = build_fail(s)
    n = len(s)
    k = n - fail[n - 1]
    return k if n % k == 0 else n

For s = "ABABABAB", fail[-1] = 6, k = 8 - 6 = 2. Since 8 % 2 == 0, the period is 2 ("AB").

2. Shortest palindrome by KMP on the reverse

To find the shortest palindrome that starts with s, concatenate s + '#' + reversed(s) and compute the failure function. fail[-1] gives the length of the longest prefix of s that is also a palindrome prefix (because it’s a prefix of s that matches a suffix of reversed(s)). This directly solves LeetCode #214 Shortest Palindrome.

3. Count of pattern occurrences in text including overlapping

The fail[j-1] at the end of a full match handles overlapping occurrences automatically. The kmp_search above already counts them correctly.

Complexity

PhaseTimeSpace
Build failure functionO(m)O(m)
SearchO(n)O(1) (plus O(m) for fail array)
TotalO(n + m)O(m)

Quick check

For pattern P = 'AABAA', what is fail[4]?
After a full match at the end of kmp_search, why do we set j = fail[j-1] instead of j = 0?
Why does the failure function run in O(m) even though it contains a while loop inside a for loop?

Next: Rabin-Karp β€” rolling polynomial hash, O(1) window comparisons, and why it’s the right tool for multi-pattern search and duplicate substring detection.