Python's heapq module
Often when working with collections of data, you may want to find the
smallest or largest item. It's easy enough to write a function that
iterates through the items and returns the smallest or largest one, or
use the builtin min()
, max()
, or
sorted()
functions. Another interesting way may be
implementing a heap (priority) queue.
Python provides a pretty convenient module called heapq
that does that for you. heapq
comes with a cool set of
inbuilt functions that you can read more about in the
docs.
Finding the smallest items
Find 3 of the smallest items using the nsmallest
function:
import heapq
numbers = [9, 45, 21, 4, 63, 3, 109]
print(heapq.nsmallest(3, numbers)) # [3, 4, 9]
Also, converting a list into a heap using the heapify
function automatically sets the smallest item as the first:
import heapq
numbers = [9, 45, 21, 4, 63, 3, 109]
heapq.heapify(numbers)
print(numbers)
# Output: [3, 4, 9, 45, 63, 21, 109]
You can then pop from the heap with heappop()
:
heapq.heappop(numbers) # 3
print(numbers)
# Output: [4, 45, 9, 109, 63, 21]
heapq.heappop(numbers) # 4
print(numbers)
# Output: [9, 45, 21, 109, 63]
Finding the largest items
Find 3 of the largest items using the nlargest
function:
import heapq
numbers = [9, 45, 21, 4, 63, 3, 109]
print(heapq.nlargest(3, numbers)) # [109, 63, 45]
A more practical example
import heapq
people = [
{'fname': 'John', 'lname': 'Doe', 'age': 30},
{'fname': 'Jane', 'lname': 'Doe', 'age': 25},
{'fname': 'Janie', 'lname': 'Doe', 'age': 10},
{'fname': 'Jane', 'lname': 'Roe', 'age': 22},
{'fname': 'Johnny', 'lname': 'Doe', 'age': 12},
{'fname': 'John', 'lname': 'Roe', 'age': 45}
]
oldest = heapq.nlargest(2, people, key=lambda s: s['age'])
print(oldest)
# Output:
# [{'fname': 'John', 'lname': 'Roe', 'age': 45},
# {'fname': 'John', 'lname': 'Doe', 'age': 30}]
youngest = heapq.nsmallest(2, people, key=lambda s: s['age'])
print(youngest)
# Output:
# [{'fname': 'Janie', 'lname': 'Doe', 'age': 10},
# {'fname': 'Johnny', 'lname': 'Doe', 'age': 12}]
It should be noted though that the nsmallest(n, iterable)
and nlargest(n, iterable)
functions perform best for
smaller values of n
. For larger values, it is more
efficient to use the sorted()
function. Also, when
n==1
, it is more efficient to use the builtin
min()
and max()
functions.