Advanced String Patterns
The three algorithms on the previous pages (KMP, Rabin-Karp, Z-algorithm) cover single-pattern search in linear time. This page introduces three more techniques that come up in harder problems:
- Manacher’s algorithm — longest palindromic substring in O(n). The palindrome version of KMP’s “don’t restart from scratch.”
- Aho-Corasick — multi-pattern search in O(n + total pattern length + matches). KMP on a trie.
- Suffix arrays — a sorted array of all suffixes, enabling O(m log n) search for any pattern and O(n log n) construction. The data structure behind many “hard” string problems.
1. Manacher’s Algorithm
The palindrome problem
Find the longest palindromic substring of
s.
The brute-force expand-around-center approach tries every center, expands while characters match, and is O(n²) (n centers × n expansion per center worst case). Manacher’s does it in O(n) using the same Z-box principle.
The mirror trick
The key insight: if you’re inside a palindrome, your mirror position around the palindrome’s center gives you a free match length. Specifically, if we’re computing the radius p[i] at center i, and i is inside a known palindrome centered at c with radius r (so c - r <= i <= c + r), then p[i] >= min(p[2*c - i], c + r - i).
2*c - iis the mirror ofiaroundc.p[2*c - i]is the known palindrome radius at the mirror.c + r - iis how much room we have inside the known palindrome.
We take the minimum because if the mirror’s palindrome extends beyond the current palindrome’s edge, we can only guarantee matches up to the edge.
The odd/even unification trick
To handle both odd-length and even-length palindromes with one pass, insert a separator character (e.g., '#') between every character and at both ends:
s = "abacaba"
T = "#a#b#a#c#a#b#a#"Now every palindrome in T is odd-length (since we inserted characters between every pair), and an even-length palindrome in s corresponds to a palindrome centered on a '#'.
Implementation
Time: O(n). Space: O(n).
The argument is the same as the Z-algorithm: the r pointer never decreases, every expansion increments r by 1, and r is bounded by |T| = 2n + 1. Total expansions across all i: O(n).
Manacher’s is the string algorithm most frequently cited as “magic” in interviews. It’s not. Once you see the Z-box analogy — “if I’m inside a known palindrome, my mirror tells me a free radius” — the code follows directly. The #-insertion trick is just bookkeeping to avoid separate even/odd cases.
2. Aho-Corasick: multi-pattern search
The problem
Given a text
Tand a set of patternsP1, P2, ..., Pk, find all occurrences of all patterns inT.
Running KMP separately for each pattern takes O(n × k + total_pattern_length) — linear in the text for each pattern. If k = 10,000, that’s expensive. Aho-Corasick does it in O(n + total_pattern_length + number_of_matches) — one pass over the text, finding all pattern occurrences simultaneously.
The idea
Build a trie of all patterns. Then build failure links on the trie (exactly KMP’s failure function, but on the trie structure). Each failure link points from a node back to the longest proper suffix of the current path that is also a prefix of some pattern.
During the search, run the text through the automaton: on each character, follow a trie edge if one exists, or follow failure links until you find a matching edge or reach the root. Every time you reach a node that marks a complete pattern, record a match.
The automaton processes each character in O(1) amortized (same argument as KMP). The total traversal is O(n).
from collections import deque
def build_aho_corasick(patterns):
# Trie: each node is {char: child_idx}, plus fail and output
trie = [{'fail': 0, 'output': []}]
goto = [{}]
for pid, p in enumerate(patterns):
node = 0
for ch in p:
if ch not in goto[node]:
goto.append({})
trie.append({'fail': 0, 'output': []})
goto[node][ch] = len(trie) - 1
node = goto[node][ch]
trie[node]['output'].append(pid)
q = deque()
for ch, child in goto[0].items():
trie[child]['fail'] = 0
q.append(child)
while q:
u = q.popleft()
for ch, v in goto[u].items():
fail = trie[u]['fail']
while fail and ch not in goto[fail]:
fail = trie[fail]['fail']
trie[v]['fail'] = goto[fail].get(ch, 0) if fail else goto[0].get(ch, 0)
if trie[v]['fail'] == v:
trie[v]['fail'] = 0
trie[v]['output'] += trie[trie[v]['fail']]['output']
q.append(v)
return trie, goto
def aho_corasick_search(T, patterns):
trie, goto = build_aho_corasick(patterns)
node, results = 0, []
for i, ch in enumerate(T):
while node and ch not in goto[node]:
node = trie[node]['fail']
node = goto[node].get(ch, 0)
for pid in trie[node]['output']:
results.append((i - len(patterns[pid]) + 1, pid))
return resultsAho-Corasick is the algorithm behind Unix grep (especially fgrep) and many intrusion detection systems.
3. Suffix Arrays: a conceptual sketch
What is a suffix array?
For a string s of length n, the suffix array SA is a permutation of [0, 1, ..., n-1] such that s[SA[0]..] < s[SA[1]..] < ... < s[SA[n-1]..] — the suffixes sorted in lexicographic order.
For s = "banana":
| Rank | Suffix | Starting index |
|---|---|---|
| 0 | a | 5 |
| 1 | ana | 3 |
| 2 | anana | 1 |
| 3 | banana | 0 |
| 4 | na | 4 |
| 5 | nana | 2 |
SA = [5, 3, 1, 0, 4, 2].
Pattern search with a suffix array
To find all occurrences of pattern P in s, binary search on the suffix array. Any occurrence of P in s is a prefix of some suffix — so the occurrences form a contiguous range in the sorted suffix array. Two binary searches (lower bound and upper bound) find that range in O(m log n).
Construction
Naïve construction (sort all suffixes by comparison) is O(n² log n) due to O(n) comparison cost. The standard O(n log n) algorithm uses prefix doubling: sort by pairs of length-1, then length-2, then length-4, etc., doubling the comparison key each round. After log n rounds, the full sort is done.
An O(n) construction (SA-IS algorithm) exists but is complex; the O(n log n) version is standard in competitive programming.
LCP array
The Longest Common Prefix (LCP) array LCP[i] stores the length of the longest common prefix between SA[i-1] and SA[i] (adjacent suffixes in sorted order). It can be computed in O(n) from the suffix array using the Kasai algorithm.
Together, the suffix array and LCP array answer many “substring queries” in O(n log n) total preprocessing and O(m log n) per query:
- Find pattern in text
- Count distinct substrings:
n*(n+1)/2 - sum(LCP) - Longest repeated substring:
max(LCP) - Longest common substring of two strings
For interview purposes, Rabin-Karp + binary search covers most suffix array applications. The suffix array itself appears in competitive programming more than in FAANG interviews.
Summary: when to use what
| Problem | Best algorithm |
|---|---|
| Find one pattern in text | KMP or Z-algorithm |
| Find pattern, need multi-pattern generalization | Rabin-Karp |
| Find all patterns simultaneously | Aho-Corasick |
| Longest palindromic substring | Manacher’s |
| Duplicate substring detection | Binary search + Rabin-Karp |
| Many queries on the same text | Suffix array + LCP |
| Competitive programming string | Suffix array + Aho-Corasick |
Quick check
Next: Basic Questions — warm-up exercises to lock the string patterns into muscle memory.