String Algorithms & Techniques
Strings introduce a set of algorithmic techniques beyond what you saw with arrays. Pattern matching, character frequency analysis, and the sliding window on characters are the core skills that separate those who struggle with string problems from those who solve them in minutes.
Naive Pattern Matching
The simplest way to find a pattern in a text: try every position and check character by character.
- Analogy: You lost your house key somewhere in a row of boxes. You open each box, check if the key pattern matches — if not, move to the next box.
#include <vector>
#include <string>
using namespace std;
vector<int> naiveSearch(const string& text, const string& pattern) {
vector<int> positions;
int n = text.length(), m = pattern.length();
for (int i = 0; i <= n - m; i++) {
int j = 0;
while (j < m && text[i + j] == pattern[j]) {
j++;
}
if (j == m) {
positions.push_back(i); // Pattern found at index i
}
}
return positions;
}- Time Complexity: O((n - m + 1) _ m) which simplifies to O(n _ m)
- Space Complexity: O(1) extra (excluding output)
- When to use: Small inputs, quick prototyping, or when the interviewer says “start with brute force”
KMP (Knuth-Morris-Pratt) Algorithm
KMP avoids re-examining characters by preprocessing the pattern to build a “failure function” (also called the prefix table or LPS array). When a mismatch occurs, instead of going back to the start, we skip ahead using the failure function.
- Analogy: Imagine you are comparing two rulers. When you find a mismatch, instead of starting over completely, you notice that part of what you already matched is also a prefix of the pattern — so you slide the pattern forward just enough to align that repeated prefix.
The LPS (Longest Proper Prefix which is also Suffix) Array:
For pattern "ABCABD", the LPS array is [0, 0, 0, 1, 2, 0].
LPS[3] = 1because"ABCA"has prefix"A"that equals suffix"A"(length 1)LPS[4] = 2because"ABCAB"has prefix"AB"that equals suffix"AB"(length 2)
#include <vector>
#include <string>
using namespace std;
vector<int> buildLPS(const string& pattern) {
int m = pattern.length();
vector<int> lps(m, 0);
int len = 0; // Length of previous longest prefix suffix
int i = 1;
while (i < m) {
if (pattern[i] == pattern[len]) {
len++;
lps[i] = len;
i++;
} else {
if (len != 0) {
len = lps[len - 1]; // Fall back (don't increment i)
} else {
lps[i] = 0;
i++;
}
}
}
return lps;
}
vector<int> kmpSearch(const string& text, const string& pattern) {
int n = text.length(), m = pattern.length();
vector<int> lps = buildLPS(pattern);
vector<int> positions;
int i = 0, j = 0;
while (i < n) {
if (text[i] == pattern[j]) {
i++;
j++;
}
if (j == m) {
positions.push_back(i - j);
j = lps[j - 1];
} else if (i < n && text[i] != pattern[j]) {
if (j != 0)
j = lps[j - 1];
else
i++;
}
}
return positions;
}- Time Complexity: O(n + m) — linear!
- Space Complexity: O(m) for the LPS array
- When to use: Large texts with repeated patterns, when O(n*m) is too slow
Interview reality: You rarely need to code KMP from scratch in interviews. But understanding why it works (the LPS array) is frequently asked. Focus on being able to explain the concept and compute the LPS array by hand.
Rabin-Karp Algorithm
Rabin-Karp uses a rolling hash to quickly compare the pattern with substrings of the text. Instead of comparing characters one by one, it compares hash values. Only when hashes match does it verify character by character.
- Analogy: Instead of reading every book title to find a match, you compare the number of letters first (a quick hash). Only if the count matches do you actually read the title.
Rolling Hash Formula:
For a window of text t[i..i+m-1], the hash is:
hash = (t[i] * d^(m-1) + t[i+1] * d^(m-2) + ... + t[i+m-1]) mod qWhen sliding the window by one position:
new_hash = (d * (old_hash - t[i] * d^(m-1)) + t[i+m]) mod qThis recalculation takes O(1) — that is the power of rolling hash.
#include <vector>
#include <string>
using namespace std;
vector<int> rabinKarp(const string& text, const string& pattern) {
int n = text.length(), m = pattern.length();
int d = 256; // Number of characters in alphabet
int q = 101; // A prime number for hashing
vector<int> result;
int h = 1; // d^(m-1) % q
for (int i = 0; i < m - 1; i++)
h = (h * d) % q;
// Calculate initial hashes
int patternHash = 0, textHash = 0;
for (int i = 0; i < m; i++) {
patternHash = (d * patternHash + pattern[i]) % q;
textHash = (d * textHash + text[i]) % q;
}
// Slide the pattern over text
for (int i = 0; i <= n - m; i++) {
if (patternHash == textHash) {
// Hash match -- verify characters
if (text.substr(i, m) == pattern) {
result.push_back(i);
}
}
// Calculate hash for next window
if (i < n - m) {
textHash = (d * (textHash - text[i] * h) + text[i + m]) % q;
if (textHash < 0) textHash += q;
}
}
return result;
}- Time Complexity: O(n + m) average, O(n * m) worst case (hash collisions)
- Space Complexity: O(1) extra
- When to use: Searching for multiple patterns, plagiarism detection, DNA sequence matching
Two Pointers for Palindrome Checking
The most common string technique in interviews: start from both ends and walk inward, comparing characters.
- Analogy: Two people reading the same word from opposite ends of a banner. If they always see the same letter, it is a palindrome.
bool isPalindrome(const string& s) {
int left = 0, right = s.length() - 1;
while (left < right) {
if (s[left] != s[right]) return false;
left++;
right--;
}
return true;
}
// With alphanumeric filtering (LeetCode-style)
bool isPalindromeClean(const string& s) {
int left = 0, right = s.length() - 1;
while (left < right) {
while (left < right && !isalnum(s[left])) left++;
while (left < right && !isalnum(s[right])) right--;
if (tolower(s[left]) != tolower(s[right])) return false;
left++;
right--;
}
return true;
}- Time Complexity: O(n) — single pass from both ends
- Space Complexity: O(1)
Sliding Window on Strings
The sliding window technique from arrays works even better on strings. It is the go-to approach for substring problems.
Key pattern: Maintain a window [left, right) and a character frequency map. Expand right to include more characters, shrink left when a condition is violated.
Example: Longest Substring Without Repeating Characters
int lengthOfLongestSubstring(const string& s) {
unordered_set<char> window;
int left = 0, maxLen = 0;
for (int right = 0; right < s.length(); right++) {
while (window.count(s[right])) {
window.erase(s[left]);
left++;
}
window.insert(s[right]);
maxLen = max(maxLen, right - left + 1);
}
return maxLen;
}- Time Complexity: O(n) — each character is added and removed at most once
- Space Complexity: O(min(n, k)) where k is the character set size
Character Frequency Counting
The bread and butter of string problems. Counting character frequencies with an array or hash map unlocks solutions to anagrams, permutations, and character-based comparisons.
// Check if two strings are anagrams
bool isAnagram(const string& s, const string& t) {
if (s.length() != t.length()) return false;
int freq[26] = {0};
for (char c : s) freq[c - 'a']++;
for (char c : t) freq[c - 'a']--;
for (int i = 0; i < 26; i++) {
if (freq[i] != 0) return false;
}
return true;
}- Time Complexity: O(n) — single pass to count, single pass to compare
- Space Complexity: O(1) — fixed 26-element array for lowercase English
Choosing the Right Technique
| Problem Pattern | Technique | Time |
|---|---|---|
| ”Find pattern in text” | Naive / KMP / Rabin-Karp | O(n*m) / O(n+m) |
| “Is it a palindrome?” | Two Pointers from ends | O(n) |
| “Longest/shortest substring with property X” | Sliding Window | O(n) |
| “Are these anagrams?” | Character Frequency Array | O(n) |
| “Group similar strings” | Sort each string or frequency key | O(n * k log k) |
| “All permutations of pattern in text” | Sliding Window + Frequency | O(n) |
Mental model: If the problem is about a substring (contiguous), think Sliding Window. If it is about character composition (order does not matter), think Frequency Counting. If it is about matching from both ends, think Two Pointers.