Longest Duplicate Substring hard
Description
Given a string s, consider all duplicated substrings: (contiguous) substrings of s that occur 2 or more times. The occurrences may overlap.
Return any duplicated substring that has the longest possible length. If s does not have a duplicated substring, the answer is "".
Examples
> Case 1:
s = "banana"
Output: "ana"
Explanation: "ana" appears at indices 1 and 3 (overlapping).
> Case 2:
s = "abcd"
Output: ""
Explanation: No substring appears more than once.Constraints
2 <= s.length <= 3 * 10^4sconsists of lowercase English letters.
State design / Intuition
Binary search on the answer length.
The set of valid lengths has the property: if a duplicate exists for length k, it also exists for all lengths < k (any prefix of the duplicate is also a duplicate). This monotone property makes binary search valid.
For each candidate length L, check if any substring of length L appears more than once using a rolling hash (Rabin-Karp):
- Compute the hash of
s[0..L-1]. - Slide the window: in O(1), compute
hash(s[i..i+L-1])fromhash(s[i-1..i+L-2]). - Store each hash in a set; if we see a repeat, a duplicate exists.
To avoid false positives from hash collisions, verify with actual string comparison when a hash is seen twice. For this problem, double hashing (using two independent hash functions) makes false positives astronomically unlikely without the verification overhead.
Finding the actual substring: when the binary search converges to the longest duplicate length L, re-run the hash scan at length L and return the actual substring.
Code
Why binary search + rolling hash beats brute force: Brute force (check all pairs of substrings) is O(n³). Suffix array finds the answer in O(n log n) construction + O(n) scan of the LCP array, but requires a complex implementation. Binary search + rolling hash hits O(n log n) time with much simpler code — the standard competitive-programming trade-off.
Analysis
- Time: O(n log n) — O(log n) binary search iterations, each O(n) rolling hash scan.
- Space: O(n) — the hash set storing positions.
Same skin
- Longest Common Substring of Two Strings — binary search on length + rolling hash on the concatenation.
- Find Duplicate File in System — same hash-based duplicate detection.
- Longest Repeated Character Replacement — sliding window variant (different validity condition).
- Longest Palindromic Substring — different problem structure (palindrome vs. duplicate), but rolling hash appears in some O(n log n) palindrome finders.