🚀 Phases 1–5 are live — Days 1–17 cover the foundations and the algorithmic patterns. See the roadmap →

Longest Duplicate Substring hard

Description

Given a string s, consider all duplicated substrings: (contiguous) substrings of s that occur 2 or more times. The occurrences may overlap.

Return any duplicated substring that has the longest possible length. If s does not have a duplicated substring, the answer is "".

Examples

> Case 1:
    s = "banana"
    Output: "ana"
    Explanation: "ana" appears at indices 1 and 3 (overlapping).
 
> Case 2:
    s = "abcd"
    Output: ""
    Explanation: No substring appears more than once.

Constraints

  • 2 <= s.length <= 3 * 10^4
  • s consists of lowercase English letters.

State design / Intuition

Binary search on the answer length.

The set of valid lengths has the property: if a duplicate exists for length k, it also exists for all lengths < k (any prefix of the duplicate is also a duplicate). This monotone property makes binary search valid.

For each candidate length L, check if any substring of length L appears more than once using a rolling hash (Rabin-Karp):

  1. Compute the hash of s[0..L-1].
  2. Slide the window: in O(1), compute hash(s[i..i+L-1]) from hash(s[i-1..i+L-2]).
  3. Store each hash in a set; if we see a repeat, a duplicate exists.

To avoid false positives from hash collisions, verify with actual string comparison when a hash is seen twice. For this problem, double hashing (using two independent hash functions) makes false positives astronomically unlikely without the verification overhead.

Finding the actual substring: when the binary search converges to the longest duplicate length L, re-run the hash scan at length L and return the actual substring.

Code

Why binary search + rolling hash beats brute force: Brute force (check all pairs of substrings) is O(n³). Suffix array finds the answer in O(n log n) construction + O(n) scan of the LCP array, but requires a complex implementation. Binary search + rolling hash hits O(n log n) time with much simpler code — the standard competitive-programming trade-off.

Analysis

  • Time: O(n log n) — O(log n) binary search iterations, each O(n) rolling hash scan.
  • Space: O(n) — the hash set storing positions.

Same skin

  • Longest Common Substring of Two Strings — binary search on length + rolling hash on the concatenation.
  • Find Duplicate File in System — same hash-based duplicate detection.
  • Longest Repeated Character Replacement — sliding window variant (different validity condition).
  • Longest Palindromic Substring — different problem structure (palindrome vs. duplicate), but rolling hash appears in some O(n log n) palindrome finders.