Advanced String Patterns

The longest-palindrome problem has a famous O(n²) solution — expand around every center — that everyone learns and most people think is optimal. It isn’t. There’s an O(n) algorithm (Manacher’s) that’s reputed to be black magic, and a multi-pattern matcher (Aho-Corasick) that finds ten thousand needles in one haystack pass. Both are the same idea you just learned — “don’t re-do work you’ve already verified” — wearing new costumes. This page strips the costumes off.

The three algorithms on the previous pages (KMP, Rabin-Karp, Z-algorithm) cover single-pattern search in linear time. This page introduces three more techniques that come up in harder problems:

Manacher’s algorithm — longest palindromic substring in O(n). The palindrome version of KMP’s “don’t restart from scratch.”
Aho-Corasick — multi-pattern search in O(n + total pattern length + matches). KMP on a trie.
Suffix arrays — a sorted array of all suffixes, enabling O(m log n) search for any pattern and O(n log n) construction. The data structure behind many “hard” string problems.

1. Manacher’s Algorithm

The palindrome problem

Find the longest palindromic substring of s.

The brute-force expand-around-center approach tries every center, expands while characters match, and is O(n²) (n centers × n expansion per center worst case). Manacher’s does it in O(n) using the same Z-box principle.

The mirror trick

The key insight: if you’re inside a palindrome, your mirror position around the palindrome’s center gives you a free match length. Specifically, if we’re computing the radius p[i] at center i, and i is inside a known palindrome centered at c with radius r (so c - r <= i <= c + r), then p[i] >= min(p[2*c - i], c + r - i).

2*c - i is the mirror of i around c.
p[2*c - i] is the known palindrome radius at the mirror.
c + r - i is how much room we have inside the known palindrome.

We take the minimum because if the mirror’s palindrome extends beyond the current palindrome’s edge, we can only guarantee matches up to the edge.

The odd/even unification trick

To handle both odd-length and even-length palindromes with one pass, insert a separator character (e.g., '#') between every character and at both ends:

s = "abacaba"
T = "#a#b#a#c#a#b#a#"

Now every palindrome in T is odd-length (since we inserted characters between every pair), and an even-length palindrome in s corresponds to a palindrome centered on a '#'.

Predict first

You're computing the palindrome radius at center i, and i sits inside a known palindrome centered at c with edge r. The mirror position has a known radius p[mirror]. Why can't you just copy p[mirror] — why does the code take min(r - i, p[mirror])?

Implementation

Time: O(n). Space: O(n).

The mirror trick is one line — structurally identical to the Z-box’s. Rebuild it from memory:

python · fill in the blanks0/1 hints

def longest_palindrome(s):
  T = '#' + '#'.join(s) + '#'
  n = len(T)
  p = [0] * n
  c = r = 0
  for i in range(1, n - 1):
      mirror = 2 * c - i
      if i < r:
          # ??? free radius from the mirror, capped at the palindrome's edge
      while i + p[i] + 1 < n and i - p[i] - 1 >= 0 and T[i + p[i] + 1] == T[i - p[i] - 1]:
          p[i] += 1
      if i + p[i] > r:
          c, r = i, i + p[i]
  center = p.index(max(p))
  radius = p[center]
  start = (center - radius) // 2
  return s[start: start + radius]

It’s Z[i] = min(r - i, Z[i - l]) from the previous page with palindrome swapped in for prefix. Same proof, same O(n), same r-never-decreases amortization.

The argument is the same as the Z-algorithm: the r pointer never decreases, every expansion increments r by 1, and r is bounded by |T| = 2n + 1. Total expansions across all i: O(n).

Manacher’s is the string algorithm most frequently cited as “magic” in interviews. It’s not. Once you see the Z-box analogy — “if I’m inside a known palindrome, my mirror tells me a free radius” — the code follows directly. The #-insertion trick is just bookkeeping to avoid separate even/odd cases.

2. Aho-Corasick: multi-pattern search

The problem

Given a text T and a set of patterns P1, P2, ..., Pk, find all occurrences of all patterns in T.

Running KMP separately for each pattern takes O(n × k + total_pattern_length) — linear in the text for each pattern. If k = 10,000, that’s expensive. Aho-Corasick does it in O(n + total_pattern_length + number_of_matches) — one pass over the text, finding all pattern occurrences simultaneously.

The idea

Build a trie of all patterns. Then build failure links on the trie (exactly KMP’s failure function, but on the trie structure). Each failure link points from a node back to the longest proper suffix of the current path that is also a prefix of some pattern.

During the search, run the text through the automaton: on each character, follow a trie edge if one exists, or follow failure links until you find a matching edge or reach the root. Every time you reach a node that marks a complete pattern, record a match.

The automaton processes each character in O(1) amortized (same argument as KMP). The total traversal is O(n).

from collections import deque
 
def build_aho_corasick(patterns):
    # Trie: each node is {char: child_idx}, plus fail and output
    trie = [{'fail': 0, 'output': []}]
    goto = [{}]
 
    for pid, p in enumerate(patterns):
        node = 0
        for ch in p:
            if ch not in goto[node]:
                goto.append({})
                trie.append({'fail': 0, 'output': []})
                goto[node][ch] = len(trie) - 1
            node = goto[node][ch]
        trie[node]['output'].append(pid)
 
    q = deque()
    for ch, child in goto[0].items():
        trie[child]['fail'] = 0
        q.append(child)
 
    while q:
        u = q.popleft()
        for ch, v in goto[u].items():
            fail = trie[u]['fail']
            while fail and ch not in goto[fail]:
                fail = trie[fail]['fail']
            trie[v]['fail'] = goto[fail].get(ch, 0) if fail else goto[0].get(ch, 0)
            if trie[v]['fail'] == v:
                trie[v]['fail'] = 0
            trie[v]['output'] += trie[trie[v]['fail']]['output']
            q.append(v)
 
    return trie, goto
 
def aho_corasick_search(T, patterns):
    trie, goto = build_aho_corasick(patterns)
    node, results = 0, []
    for i, ch in enumerate(T):
        while node and ch not in goto[node]:
            node = trie[node]['fail']
        node = goto[node].get(ch, 0)
        for pid in trie[node]['output']:
            results.append((i - len(patterns[pid]) + 1, pid))
    return results

Aho-Corasick is the algorithm behind Unix grep (especially fgrep) and many intrusion detection systems.

3. Suffix Arrays: a conceptual sketch

What is a suffix array?

For a string s of length n, the suffix array SA is a permutation of [0, 1, ..., n-1] such that s[SA[0]..] < s[SA[1]..] < ... < s[SA[n-1]..] — the suffixes sorted in lexicographic order.

For s = "banana":

Rank	Suffix	Starting index
0	a	5
1	ana	3
2	anana	1
3	banana	0
4	na	4
5	nana	2

SA = [5, 3, 1, 0, 4, 2].

Pattern search with a suffix array

To find all occurrences of pattern P in s, binary search on the suffix array. Any occurrence of P in s is a prefix of some suffix — so the occurrences form a contiguous range in the sorted suffix array. Two binary searches (lower bound and upper bound) find that range in O(m log n).

Construction

Naïve construction (sort all suffixes by comparison) is O(n² log n) due to O(n) comparison cost. The standard O(n log n) algorithm uses prefix doubling: sort by pairs of length-1, then length-2, then length-4, etc., doubling the comparison key each round. After log n rounds, the full sort is done.

An O(n) construction (SA-IS algorithm) exists but is complex; the O(n log n) version is standard in competitive programming.

LCP array

The Longest Common Prefix (LCP) array LCP[i] stores the length of the longest common prefix between SA[i-1] and SA[i] (adjacent suffixes in sorted order). It can be computed in O(n) from the suffix array using the Kasai algorithm.

Together, the suffix array and LCP array answer many “substring queries” in O(n log n) total preprocessing and O(m log n) per query:

Find pattern in text
Count distinct substrings: n*(n+1)/2 - sum(LCP)
Longest repeated substring: max(LCP)
Longest common substring of two strings

For interview purposes, Rabin-Karp + binary search covers most suffix array applications. The suffix array itself appears in competitive programming more than in FAANG interviews.

Summary: when to use what

Problem	Best algorithm
Find one pattern in text	KMP or Z-algorithm
Find pattern, need multi-pattern generalization	Rabin-Karp
Find all patterns simultaneously	Aho-Corasick
Longest palindromic substring	Manacher’s
Duplicate substring detection	Binary search + Rabin-Karp
Many queries on the same text	Suffix array + LCP
Competitive programming string	Suffix array + Aho-Corasick

Quick check

In Manacher's algorithm, what does the p[i] array represent?

What is Aho-Corasick's key advantage over running KMP independently for each of k patterns?

How does a suffix array enable O(m log n) pattern search (instead of the naïve O(nm))?

Manacher’s is the Z-algorithm’s box trick on palindromes; Aho-Corasick is KMP’s failure function on a trie. Nothing here is new magic — it’s the same “carry forward verified work” idea you’ve now seen four times, which is exactly why it’s worth recognizing. Next: Basic Questions — warm-up exercises to lock these patterns into muscle memory before the full practice set.

Z-Algorithm Basic Questions

Finished this page?