Floyd-Warshall Algorithm

All-pairs shortest paths — every vertex to every other — in five lines and O(V³), negatives allowed. Floyd-Warshall looks like a brute-force triple loop, but it’s one of the most elegant dynamic programs in CS: it switches on one “allowed intermediate vertex” at a time and asks, for every pair, would routing through this new vertex be shorter?

If Dijkstra is a Swiss Army knife and Bellman-Ford is a sledgehammer, Floyd-Warshall is a rolling pin. It’s the shortest, simplest, most direct way to compute all-pairs shortest paths, and it runs in O(V³) regardless of edge weights (negatives are fine, as long as no negative cycle exists).

It’s also one of the most elegant pieces of dynamic programming you’ll ever see.

The DP formulation

Predict first

Define dist[k][i][j] = shortest i→j path using only vertices {0..k} as intermediates. When you 'allow' one more vertex k, what new option does every pair (i, j) get to consider?

Let dist[k][i][j] = shortest distance from i to j using only vertices {0, 1, ..., k} as intermediates.

Two cases:

k is not on the shortest path from i to j. Then dist[k][i][j] = dist[k-1][i][j].
k is on the path. The path goes i → ... → k → ... → j. Both halves use only intermediates {0, ..., k-1}. So dist[k][i][j] = dist[k-1][i][k] + dist[k-1][k][j].

Take the min:

dist[k][i][j] = min(dist[k-1][i][j],
                    dist[k-1][i][k] + dist[k-1][k][j])

Base case: dist[-1][i][j] is the direct edge weight (or ∞ if no edge, 0 if i == j).

You can collapse the k dimension — at each k, the values you need on the right-hand side are exactly the ones currently in the table. Floyd-Warshall is the table compressed to 2D, with three nested loops.

The template

Time: O(V³). Space: O(V²) for the matrix.

Recall the two lines that carry the algorithm — which loop is outermost, and the “route through k” relax:

python · fill in the blanks0/2 hints

def floyd_warshall(dist):              # dist[i][j] = direct edge, inf if none, 0 on diagonal
  n = len(dist)
  # ??? the outermost loop / the relax
      for i in range(n):
          for j in range(n):
              if dist[i][k] + dist[k][j] < dist[i][j]:
                  # ??? the outermost loop / the relax

That’s it. Five lines for the core algorithm. The hardest part is remembering that k is the outermost loop — switch the order and the DP breaks.

⚠️

k must be outermost. The recurrence updates dist[i][j] using values that incorporate intermediates ≤ k. If i or j is outermost, you’ll be reading and writing the same dist cell within a single “iteration” inconsistently, and the answers will be wrong. Memorize the loop order: k, i, j.

Worked example

Three vertices, direct edges:

        0    1    2
    0 [ 0    5    ∞ ]
    1 [ ∞    0    3 ]
    2 [ 2    ∞    0 ]

After k = 0 (allow 0 as intermediate): nothing changes since row 0 has no edges going inward.

After k = 1: check dist[i][1] + dist[1][j] for every (i, j). dist[0][1] + dist[1][2] = 5 + 3 = 8, less than ∞ — update dist[0][2] = 8.

        0    1    2
    0 [ 0    5    8 ]
    1 [ ∞    0    3 ]
    2 [ 2    ∞    0 ]

After k = 2: dist[1][2] + dist[2][0] = 3 + 2 = 5, less than ∞ — update dist[1][0] = 5. dist[1][2] + dist[2][1] = 3 + ∞ = ∞, no change. Also dist[0][2] + dist[2][0] = 8 + 2 = 10, but dist[0][0] is already 0.

        0    1    2
    0 [ 0    5    8 ]
    1 [ 5    0    3 ]
    2 [ 2    7    0 ]

Wait, dist[2][1]? Through k = 1 we had nothing, through k = 0 we get dist[2][0] + dist[0][1] = 2 + 5 = 7. Yes — but that needed k = 0 AND k = 1 ordering working together. Tracing the literal DP: dist[2][1] updates at k = 0 to 7, then k = 1 finds nothing more. Final matrix is the one above.

Negative-cycle detection

After running Floyd-Warshall, check the diagonal. If dist[i][i] < 0 for any i, the graph has a negative cycle reachable from and to i — going around that cycle made the “distance from i to itself” negative.

for i in range(n):
    if dist[i][i] < 0:
        return None  # negative cycle exists

Unlike Bellman-Ford (which only detects negative cycles reachable from the source), Floyd-Warshall detects them everywhere — useful for problems where the source isn’t fixed.

Path reconstruction

Store a next[i][j] matrix during the DP, recording the next vertex on the path from i to j:

def floyd_warshall_with_path(dist):
    n = len(dist)
    nxt = [[j if dist[i][j] != float('inf') else -1
            for j in range(n)]
           for i in range(n)]
    for k in range(n):
        for i in range(n):
            for j in range(n):
                if dist[i][k] + dist[k][j] < dist[i][j]:
                    dist[i][j] = dist[i][k] + dist[k][j]
                    nxt[i][j] = nxt[i][k]
 
def path(nxt, i, j):
    if nxt[i][j] == -1:
        return []
    p = [i]
    while i != j:
        i = nxt[i][j]
        p.append(i)
    return p

O(V²) extra space, O(path length) reconstruction.

When to use Floyd-Warshall

You need all-pairs shortest distances.
The graph is small — typically V ≤ 400 or so. At V = 500, you’re looking at 1.25 × 10^8 operations, still fast in C++ but borderline in Python/Java.
The graph is dense — when E ≈ V², the alternative (V runs of Dijkstra) is O(V · (V + E) log V) = O(V³ log V) — Floyd-Warshall wins.
You need negative-cycle detection over all sources/sinks.
The graph changes and you want to support fast updates — Floyd-Warshall isn’t naturally dynamic, but the structure lends itself to incremental edge-addition variants.

When NOT to use Floyd-Warshall

The graph is sparse and large. V = 10000 with E = O(V) makes O(V³) = 10^12 — infeasible. Run Dijkstra from each source instead.
You only need a single source. Floyd-Warshall computes way more than you need.

The “transitive closure” trick

If you only care whether vertex i can reach vertex j (not the distance), use Floyd-Warshall with + replaced by OR and min replaced by OR:

for k in range(n):
    for i in range(n):
        for j in range(n):
            reach[i][j] = reach[i][j] or (reach[i][k] and reach[k][j])

O(V³), same shape. Called Warshall’s algorithm in older textbooks. Useful for any problem framed as “given these direct relationships, who is connected to whom?”

Quick check

In the Floyd-Warshall triple loop, why must k be the outermost loop?

What's the time complexity of running Dijkstra from every source on a sparse graph (E = O(V)) vs Floyd-Warshall, for computing all-pairs shortest paths?

Floyd-Warshall is the all-pairs specialist — and a beautiful example of the interval/DP thinking from Day 14 applied to graphs. Next: Picking the Right One — a side-by-side decision guide so you instantly know which of the four to reach for.

Bellman-Ford Picking the Right Algorithm

Finished this page?