Alien Dictionary hard
Description
There is a new alien language that uses the English alphabet. However, the order among the letters is unknown to you.
You are given a list of strings words from the alien language’s dictionary, where the strings in words are sorted lexicographically by the rules of this new language.
Return a string of the unique letters in the new alien language sorted in lexicographically increasing order by the new language’s rules. If there is no solution, return "". If there are multiple solutions, return any of them.
Examples
> Case 1:
words = ["wrt", "wrf", "er", "ett", "rftt"]
Output: "wertf"
> Case 2:
words = ["z", "x"]
Output: "zx"
> Case 3:
words = ["z", "x", "z"]
Output: ""
Explanation: the order is invalid — no solution.Constraints
1 <= words.length <= 1001 <= words[i].length <= 100words[i]consists of only lowercase English letters.
State design
This is the boss-level topo-sort problem. Two phases:
- Build the precedence DAG. For each adjacent pair
(words[i], words[i+1]), find the first differing character — that gives one directed edge “earlier-letter → later-letter”. Also handle the prefix violation edge case ("abc"before"ab"is impossible). - Topo sort the resulting graph to recover the alphabet order.
The hard part is edge extraction — getting the DAG right.
Code
The three classic Alien Dictionary bugs:
- Forgetting the prefix violation check. If
a = "abc"andb = "ab", the input is inconsistent — no valid ordering. Output"". - Adding duplicate edges. Adding the same
(u, v)edge multiple times inflatesindeg[v]and the topo sort never terminates correctly. Use a set, or check before inserting. - Missing characters. Every character that appears in any word must be in the output. Don’t only add characters that get edges — initialize
indeg[c] = 0for every letter encountered.
Analysis
- Time:
O(C + sum of word lengths)whereCis the number of distinct characters (at most 26). - Space:
O(C²)for the adjacency map (bounded by 26² edges).
Same skin
- Verifying an Alien Dictionary — given a proposed alphabet, check whether a word list is sorted under it. Single pass, no topo sort.
- Sequence Reconstruction — same shape: build a DAG from short ordered constraints, topo-sort, verify uniqueness.
- Build a Matrix with Conditions — two independent topo sorts (rows + cols) from precedence constraints.
- Loud and Rich — DFS on a precedence DAG with memoization.
This is the canonical “real-world topo sort” interview problem. The algorithmic core is mechanical once you’ve done Kahn’s; the difficulty is entirely in correctly extracting the DAG from the constraints.