Heaps and Priority Queues
[ Last Updated: 2024-05-11 ]
Introduction: Since last week's look into Dijkstra's optimization involved heaps, I decided to dive deeper into data structures this week, starting with the Heap.
What is a Heap? Why use it?
A Heap is a tree-based data structure typically used to provide fast access to the maximum or minimum value in a collection of data. Heaps are usually constructed as Complete Binary Trees. A valid heap satisfies one of the following properties:
- Min-Heap: The value of each node is less than or equal to the values of its children.
- Max-Heap: The value of each node is greater than or equal to the values of its children.
Although we visualize a heap as a tree, its "complete binary tree" property allows it to be stored directly in a standard list. This is because indices in the list correspond directly to positions in the tree:
- For a node at index (starting from 1), its children are at and .
- Its parent is at (integer division).
Because of this structure, search (for the extreme value), insertion, and deletion can be maintained at , which matches the height of the tree. This is a massive improvement over the time required to scan a simple list.
Basic Heap Operations
1. Insertion (Insert and Bubble Up) -
- Insert: Add the new element to the end of the list (the bottom-most, right-most position in the tree).
- Bubble Up: Repeatedly swap the element with its parent until the heap property is restored.

2. Extraction (Extract and Bubble Down) -
- Extract: Remove the root (min/max value) and replace it with the last element in the heap.
- Bubble Down: Repeatedly push this "new root" down by swapping it with its smallest/largest child until the heap property is restored.

3. Deletion -
- Remove the node at index .
- Replace it with the very last element in the heap and perform a bubble down (similar to extraction).
4. Heapify -
- Throw the entire initial sequence into a list.
- Start from the first non-leaf node () and work upwards, performing bubble down at each node. The total time complexity is .
Note: Finding the extreme value is —just peek at the root!
Heapq in Python
Python provides the heapq module to implement min-heap operations effortlessly.
import heapq
import random
# Generate 10 random elements and heapify them
h = [random.randint(1, 100) for _ in range(10)]
print("Randomly generated list:", h)
heapq.heapify(h)
print("Min-Heap list:", h)
# Peek at the minimum
print("Minimum element:", h[0])
# Extract the minimum
min_val = heapq.heappop(h)
print("Popped element:", min_val)
print("Remaining heap:", h)
# Push a new random number
x = random.randint(1, 100)
heapq.heappush(h, x)
print("Heap after pushing new element:", h)Note: heapq does not create a new class; it manages a standard list. You can start with an empty list heap = [] and use heapq.heappush(heap, item) to maintain heap properties automatically.
Practical Case: LeetCode 295. Find Median from Data Stream
Problem Description
The median is the middle value in an ordered list. If the size is even, it is the mean of the two middle values. Implement a class to add numbers from a stream and find the median efficiently.
Strategy
This can be solved efficiently using two heaps:
-
A Max-Heap for the left half (values smaller than the median).
-
A Min-Heap for the right half (values larger than the median).
Note: Since Python's heapq only supports min-heaps, we store negative values in the max-heap to simulate max-heap behavior.
Logic:
-
If the new number is smaller than the min of the right side, push to the left (max-heap). Otherwise, push to the right (min-heap).
-
Balance: Ensure the size difference between the two heaps is no more than 1.
-
Median: If sizes are unequal, the median is the top of the larger heap. If equal, it's the average of both tops.
Python
import heapq
class MedianFinder(object):
def __init__(self):
self.maxheap = [] # Left side (stored as negative)
self.minheap = [] # Right side
def addNum(self, num):
if not self.minheap or num < self.minheap[0]:
heapq.heappush(self.maxheap, -num)
else:
heapq.heappush(self.minheap, num)
# Rebalance
if len(self.maxheap) - len(self.minheap) > 1:
heapq.heappush(self.minheap, -heapq.heappop(self.maxheap))
elif len(self.minheap) - len(self.maxheap) > 1:
heapq.heappush(self.maxheap, -heapq.heappop(self.minheap))
def findMedian(self):
if not self.maxheap and not self.minheap:
return None
if len(self.maxheap) > len(self.minheap):
return float(-self.maxheap[0])
elif len(self.minheap) > len(self.maxheap):
return float(self.minheap[0])
else:
return (-self.maxheap[0] + self.minheap[0]) / 2.0Summary
Heaps are powerful for handling dynamic data and implementing priority queues. Their strength lies in rapid access to extremes and efficient updates. In the real world, they are used for database maintenance, game event processing, and real-time task scheduling where priorities change constantly.
—Untitled penguin 2024/5/11 0:49