Dijkstra's Shortest Path

[ Last Updated: 2024-05-03 ]

Scope of Application

The shortest path problem is one of the most common issues in graph theory. Dijkstra's algorithm is one of the most widely used algorithms for solving the Single-Source Shortest Path problem (specifically for graphs without negative edge weights). It calculates the shortest distance from a given starting point to all other nodes in the graph.

Core Algorithm

Dijkstra's algorithm operates based on a greedy strategy:

Imagine we divide the graph into two parts: one part consists of nodes where the shortest path has already been calculated {X} (the "Old World" in the diagram below), and the other part consists of nodes not yet calculated {V-X} (the "New World"). We can call the edges connecting X to V-X "bridge" edges.

From these bridge edges, we select the one that results in the shortest total path distance from the origin A to the bridge's tail. We then incorporate the node connected by this bridge into our explored territory (for example, node E reached via the path ACE in the diagram). Node E then becomes part of the Old World. Simultaneously, other edges connected to E that were previously deep in the New World become new "bridge" edges and are added to the collection. This process repeats until the New World is fully explored ({V-X} is empty).

Mathematical Proof

Why does this greedy strategy work? We can prove it using mathematical induction:

Proposition: For all explored nodes $v$ , the greedy path length $B[v]$ found by Dijkstra's strategy is indeed the true shortest path $S[A \to v]$ from the origin A to node $v$ (i.e., $B[v] = S[A \to v]$ ).

Base Case: For the starting node A, $B[A] = 0$ and $S[A \to A] = 0$ . In the absence of negative weights, this is clearly true.

Inductive Step: Assume the proposition holds for the first $k-1$ iterations. For the $k$ -th iteration, we need to prove that for the new node $w^*$ added by Dijkstra, there is no shorter path from A to $w^*$ .

Suppose $P$ is any arbitrary path from A to $w^*$ (the new node found by Dijkstra). $P$ must contain at least one bridge edge (it must cross from the Old World to the New World at least once). Let the first crossing edge be $(y, z)$ , where $y \in \{X\}$ and $z \in \{V-X\}$ . We can split $P$ into: $P(A \to y) + P(y \to z) + P(z \to w^*)$

By the inductive hypothesis, $B[y]$ is the shortest path from A to $y$ , so the distance in path $P$ from the source to $y$ cannot be shorter than $B[y]$ .

According to Dijkstra’s greedy strategy, the node $w^*$ we choose always satisfies: $B[v^*] + l[v^* \to w^*]$ is the minimum among all paths crossing via bridge edges. Thus: $B[v^*] + l[v^* \to w^*] \leq B[y] + l[y \to z]$ .

Since there are no negative edge weights, it follows that $P(z \to w^*) \geq 0$ .

Combining these: $P(A \to w^*) = P(A \to y) + P(y \to z) + P(z \to w^*) \geq B[y] + l[y \to z] \geq B[v^*] + l[v^* \to w^*]$ .

Therefore, no path $P$ can be shorter than the path found by Dijkstra. By induction, the hypothesis is true.

Optimization: Heap-Based Dijkstra

A basic implementation of Dijkstra's algorithm has a time complexity of approximately $O(n \cdot m)$ (iterating through nodes and edges). However, by using a Heap data structure, we can optimize the complexity to $O(m \cdot \log n)$ .

Heap (opens in a new tab) is a tree-based structure (usually a complete binary tree) with these properties:

Min-Heap: Each node's value is $\leq$ its children's values.
Max-Heap: Each node's value is $\geq$ its children's values.

In Python, we use the heapq module to handle these operations efficiently.

Heapq in Python

Here is a quick look at heapq basic operations:

import heapq 
import random
 
# Generate 10 random elements and put them in a heap
h = [random.randint(1, 100) for _ in range(10)] 
print("Random list:", h) 
 
heapq.heapify(h) 
print("Min-heap list:", h) 
 
# Pop the minimum value
min_val = heapq.heappop(h) 
print("Popped minimum:", min_val) 
print("Remaining heap:", h) 
 
# Insert a new random number 
x = random.randint(1, 100) 
heapq.heappush(h, x) 
print("Heap after insertion:", h)

Dijkstra Implementation (Heap Optimized):

Python

import heapq

class Graph: # Create Graph class to store edges and nodes
    def __init__(self):
        self.nodes = set() 
        self.edges = {} 
        
    def add_node(self, value):
        self.nodes.add(value) 
        if value not in self.edges:
            self.edges[value] = {}

    def add_edge(self, from_node, to_node, weight):
        self.add_node(from_node)
        self.add_node(to_node)
        self.edges[from_node][to_node] = weight 
        # Using directed graph logic here, though Dijkstra works for undirected too

    def dijkstra(self, start):
        # Initialize shortest distances as infinite
        distances = {node: float('inf') for node in self.nodes} 
        distances[start] = 0 

        # Use a heap to track candidate bridge path lengths
        # Stored as (distance, node)
        heap = [(0, start)] 

        while heap:
            # Pop the smallest distance node (Greedy Step)
            current_distance, current_node = heapq.heappop(heap)

            # If the popped distance is greater than the recorded shortest, skip it
            if current_distance > distances[current_node]:
                continue

            for neighbor, weight in self.edges[current_node].items():
                distance = current_distance + weight
                # If a shorter path to the neighbor is found, update and push to heap
                if distance < distances[neighbor]:
                    distances[neighbor] = distance
                    heapq.heappush(heap, (distance, neighbor)) 

        return distances

I actually wrote this Dijkstra post before officially introducing Heaps. It was while reviewing Dijkstra that I realized I needed to brush up on data structures, which led to the recent series of posts on Heaps, Balanced Trees, and Hash Tables. Dijkstra remains an absolute classic, so I'm republishing it here for reference and review.

Untitled Penguin 2024/05/26 20:44

Binary Search Tree Graphsearch