Alien Dictionary

Alien Dictionary hard

Description

There is a new alien language that uses the English alphabet. However, the order among the letters is unknown to you.

You are given a list of strings words from the alien language’s dictionary, where the strings in words are sorted lexicographically by the rules of this new language.

Return a string of the unique letters in the new alien language sorted in lexicographically increasing order by the new language’s rules. If there is no solution, return "". If there are multiple solutions, return any of them.

Examples

> Case 1:
    words = ["wrt", "wrf", "er", "ett", "rftt"]
    Output: "wertf"
 
> Case 2:
    words = ["z", "x"]
    Output: "zx"
 
> Case 3:
    words = ["z", "x", "z"]
    Output: ""
    Explanation: the order is invalid — no solution.

Constraints

1 <= words.length <= 100
1 <= words[i].length <= 100
words[i] consists of only lowercase English letters.

State design

This is the boss-level topo-sort problem. Two phases:

Build the precedence DAG. For each adjacent pair (words[i], words[i+1]), find the first differing character — that gives one directed edge “earlier-letter → later-letter”. Also handle the prefix violation edge case ("abc" before "ab" is impossible).
Topo sort the resulting graph to recover the alphabet order.

The hard part is edge extraction — getting the DAG right.

Try it yourself

⌨ Alien Dictionary — recover the letter order (or '')

def alien_order(words):
  # for each adjacent pair, the first differing char gives an edge earlier -> later;
  # watch the prefix-violation edge case, then topo-sort. return "" if impossible
  pass

Hint

Initialize indegree 0 for every letter that appears. For each adjacent word pair, add one edge from the first differing character; a longer word that is a prefix of the next is a contradiction. Then run Kahn's. These tests have a single forced chain so the order is unique.

Reveal solution

Code

⚠️

The three classic Alien Dictionary bugs:

Forgetting the prefix violation check. If a = "abc" and b = "ab", the input is inconsistent — no valid ordering. Output "".
Adding duplicate edges. Adding the same (u, v) edge multiple times inflates indeg[v] and the topo sort never terminates correctly. Use a set, or check before inserting.
Missing characters. Every character that appears in any word must be in the output. Don’t only add characters that get edges — initialize indeg[c] = 0 for every letter encountered.

Analysis

Time: O(C + sum of word lengths) where C is the number of distinct characters (at most 26).
Space: O(C²) for the adjacency map (bounded by 26² edges).

Same skin

Verifying an Alien Dictionary — given a proposed alphabet, check whether a word list is sorted under it. Single pass, no topo sort.
Sequence Reconstruction — same shape: build a DAG from short ordered constraints, topo-sort, verify uniqueness.
Build a Matrix with Conditions — two independent topo sorts (rows + cols) from precedence constraints.
Loud and Rich — DFS on a precedence DAG with memoization.

This is the canonical “real-world topo sort” interview problem. The algorithmic core is mechanical once you’ve done Kahn’s; the difficulty is entirely in correctly extracting the DAG from the constraints.

Parallel Courses Sequence Reconstruction

Finished this page?