Shallow & deep copying in python

• 3 min read

What happens when a variable is assigned to another variable in python? For example:

>>> x = 5
>>> y = x

Both x and y will have the value 5. But, when x was assigned to y, y was not created as a completely new/separate object. Instead, an alias for x was created. That is, y points to the memory location of x. It does not have it's own memory location - yet.

>>> id(x)
140428600776960
>>> id(y)
140428600776960
>>> x is y
True

You may never have any problems with this when working with immutable types because the alias is broken as soon as either of the two variables change.

>>> x += 2
>>> x
7
>>> y
5
>>> id(x)
140539682924864
>>> id(y)
140428600776960
>>> x is y
False

But when working with mutable types, the alias is not broken when the original is updated. This means changes in x would reflect in y.

>>> x = [1,2,3]
>>> y = x
>>> x
[1, 2, 3]
>>> y
[1, 2, 3]
>>> x is y
True
>>>
>>> x.append(4)
>>> x.append(5)
>>> x
[1, 2, 3, 4, 5]
>>> y
[1, 2, 3, 4, 5]
>>> x is y
True

y was updated externally. This might be a cause of bugs if for example a value is updated by an external library and other variables are affected.

This can be prevented by creating shallow/deep copies of objects instead of using assignment.

Shallow copy

A shallow copy creates a new object, then populates it with references of the objects in the original object. Continuing with the previous example, a shallow copy can be created using either the list() or copy() command.

>>> z = list(x)
>>> z
[1, 2, 3, 4, 5]

Now if some more values are appended to x, y will still be affected by z will not.

>>> x.append(6)
>>> x.append(7)
>>> x
[1, 2, 3, 4, 5, 6, 7]
>>> y
[1, 2, 3, 4, 5, 6, 7]
>>> z
[1, 2, 3, 4, 5]

However, a shallow copy doesn't fully solve the problem because even though a new list was created, the objects in the list are still references to the objects in x.

As it is currently, updating x[0] would not affect z[0] because - immutable objects - the alias would be broken. But, if we were dealing with a list of lists, an update in x[0] would affect z[0].

>>> x = [[1,2], [3,4]]
>>> y = x
>>> z = list(x)
>>> x
[[1, 2], [3, 4]]
>>> y
[[1, 2], [3, 4]]
>>> z
[[1, 2], [3, 4]]
>>> x.append([5,6])
>>> x
[[1, 2], [3, 4], [5, 6]]
>>> y
[[1, 2], [3, 4], [5, 6]]
>>> z
[[1, 2], [3, 4]]
>>>
>>> x[0][1] = 'edited'
>>> x
[[1, 'edited'], [3, 4], [5, 6]]
>>> y
[[1, 'edited'], [3, 4], [5, 6]]
>>> z
[[1, 'edited'], [3, 4]]

Deep copy

A deep copy creates a new object, and completely new instances of the objects in it. That is, a deep copied object is completely independent of the original. Updating objects in the original would not affect the deep copied object since there's no longer any connection.

>>> import copy
>>> x = [[1,2], [3,4]]
>>> z = copy.deepcopy(x)
>>> x
[[1, 2], [3, 4]]
>>> z
[[1, 2], [3, 4]]
>>> x is z
False
>>> x[0][1] = 'edited'
>>> x
[[1, 'edited'], [3, 4]]
>>> z
[[1, 2], [3, 4]]
🏷  python