Data structures
Contents
Data structures#
So far, we have seen three main examples of data structures: lists, tuples, and arrays. These all do roughly the same thing: store a sequence of objects, and allow you to access the objects using their position in the sequence. This is only one way of arranging data, though it is perhaps the most important. When computer scientists and mathematicians design algorithms, one of the key considerations is exactly how to arrange the data that is used. This choice can make an enormous difference to the performance of the algorithm. The two next structures that you are likely to come across are sets and dicts.
Sets#
Python sets, as you might hope, are used to represent (finite) mathematical sets, i.e. finite collections of objects with no duplicates. You could also represent this using a duplicate-free list
, but it is clearer and often more efficient to use a set
. Here’s a basic example:
# create an empty set
my_set = set()
# add 1 to the set
my_set.add(1)
# try to add 1 again
my_set.add(1)
#display the set
my_set
{1}
There are several things to note. Firstly, Python didn’t complain when we tried to add the number 1 a second time - it just didn’t add it.
Secondly, we add
things to a set
rather than append
them as for a list
. This is because sets
are not inherently ordered. They do not care what order elements were inserted or deleted in, or how you would sort the objects naturally.
This might seem strange to you as most mathematical sets you will have seen have an obvious order on their elements, but strictly speaking mathematical sets are not inherently ordered either - the order is an extra structure you put on top of the set.
To make things more confusing, sets
of will often look like they are naturally ordered when you ask Python to display them.
tea_set = set("teacup") # make a set from the letters in "teacup"
tea_set # the set appears to be in order
{'a', 'c', 'e', 'p', 't', 'u'}
# Now try looping through the elements of the set.
# Note that the order now seems totally arbitrary!
for x in tea_set:
print(x)
e
t
u
p
c
a
The moral of all of this is to not rely on any order from a set
. If you really want the elements in value order, then you must use sorted
, which returns a list with the values in sorted order:
sorted(tea_set)
['a', 'c', 'e', 'p', 't', 'u']
Set operations#
There are a number of useful operations to be aware of with sets. Here are a few of them:
# Create two example sets
set1 = {1, 2, 3, "a", "b"}
set2 = {"a", "c", "e", 4, 1}
Add or remove an element#
set1.add("abc")
set1
{1, 2, 3, 'a', 'abc', 'b'}
set1.remove("abc")
set1
{1, 2, 3, 'a', 'b'}
Membership#
1 in set1
True
"b" in set2
False
Intersection#
# Read "the elements that are in set1 and in set2"
set1 & set2
{1, 'a'}
# An alternative method
set1.intersection(set2)
{1, 'a'}
Union#
# The | symbol usually means "or", so read this as
# "the elements that are in set1 or in set2"
set1 | set2
{1, 2, 3, 4, 'a', 'b', 'c', 'e'}
# Alternatively:
set1.union(set2)
{1, 2, 3, 4, 'a', 'b', 'c', 'e'}
Difference#
# The elements of set1 that are not in set2
set1 - set2
{2, 3, 'b'}
# Alternatively:
set1.difference(set2)
{2, 3, 'b'}
Set efficiency#
Sets are specialised for the sort of operations shown above. If you are doing a lot of them in your code, you should probably be using a set
rather than a list
or similar data structure. For example, testing membership in a set
roughly takes a small and constant amount of time, while testing membership in a list
takes time proportional to the length of the list
- so for large lists
membership testing will be much slower than the same membership testing in a set
.
Dicts#
Short for “dictionary”, a dict
stores pairs of keys and values. You can then “look up” the value associated with a given key. This is analagous to how language dictionaries store words (keys) and definitions (values). The keys have to be unique, but the values do not. Here are some examples:
# Create a dict with keys being student ids and values being names.
# I'll make the student ids strings to avoid leading zero issues.
students = {"00112" : "Hypatia",
"13401" : "Maryam Mirzakhani",
"4105201" : "Emmy Noether",
"10231": "Sophie Kowalevski"}
# Get a value by indexing with the key
students["4105201"]
'Emmy Noether'
#Trying to access a non-existent key causes an error
students[0]
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
Input In [17], in <cell line: 2>()
1 #Trying to access a non-existent key causes an error
----> 2 students[0]
KeyError: 0
Dict operations#
The operations for dicts
are similar to those for sets
.
Add or remove a key-value pair#
To add a new key-value pair, just set the value for that key:
students["01121"] = "Ada Lovelace"
students
{'00112': 'Hypatia',
'13401': 'Maryam Mirzakhani',
'4105201': 'Emmy Noether',
'10231': 'Sophie Kowalevski',
'01121': 'Ada Lovelace'}
del students["01121"]
students
{'00112': 'Hypatia',
'13401': 'Maryam Mirzakhani',
'4105201': 'Emmy Noether',
'10231': 'Sophie Kowalevski'}
Membership#
To check whether a key is in a dict
, use in
as usual:
"00112" in students
True
There are many other things you can do with dictionaries, which we will not dive into here. You can get a flavour by skimming through the documentation.