tl;dr Python type hints are super useful! I like keeping function arguments immutable, so I suggest typing lists as sequences and using either frozen dataclasses or collections.abc.Mapping instead of dictionaries. I haven’t really seen a blog post mentioning both Sequence and Mapping. (Just focusing on one or the other.) So I really wanted to create a blog post that taught people about both.
I’ve been getting into functional programming, so that’s shaped how I’ve been using python type hints. In particular, I’ve tried to make function arguments immutable. Real Python has a great article on understanding what mutability and immutability are, as well as examples of which data types are immutable or mutable.
Lists, dictionaries, and sets are mutable in python, so they can change state as the program runs. That can be particularly tricky when you pass a list or dict into a function, then modify it within the function. It’s difficult to fully consider what will happen to that list or dict outside of that function. There are some good options with type hints to avoid mutating data passed into function. (Thus saving you from trying to fully consider the side effects of passing mutable data around.)
using Sequence
Steve Brazier has a great blog post on why to use sequences instead of lists. I also agree on the focus for function arguments. When I was starting to learn about immutability with functional programming, I first looked at using immutable data structures (like tuples). But that contorts python into a language it isn’t. I like what the blog references as “soft immutability”. Where the type checker is what prevents modifying a list passed into a function.
Here’s a core example from the post but please read the whole thing!
If I try and write this code:
def calculate_sum_and_add_ten(items: Sequence[float]): items.append(10) return sum(items)
then I will get an error from mypy (or my IDE or any other type checker I may be running):
error: "Sequence[float]" has no attribute "append" [attr-defined]
This means I won’t accidentally mutate a list I pass in to the function. If I expect the function to mutate the list then I can communicate this fact by altering the typehint to a list. This helps make my intent clear. I took a list because I wanted a list (with all its mutability).
Keeping mutation within a function seems to be a fair compromise within python while gaining some benefits of functional programming. It’ll be easier to reason about purely what the function is doing. It’s especially beneficial to take advantage of immutability when dealing with async/await in python. There’s a lot more complexity any time there is asynchronous code and you want to do what you can to simplify it all.
One tiny thing to note with using the sequence type, is that you need to import
it from collections.abc
instead of typing
(in Python 3.9+) due to a
deprecation warning.
using frozen dataclasses, pydantic, and Mapping
Along similar lines for dictionaries, Roman Imankulov has a great post about the
options for python dictionaries. Frozen dataclasses are especially well suited
when you know the keys that would be in your dict. Pydantic is super useful for
runtime type validation. (I have used pydantic a lot for validating json
configuration files that my code may reference at runtime.) An additional option
for preventing dictionaries from mutating is collections.abc.Mapping
. It’s the
most similar parallel with the “soft immutability” of sequences instead of
lists. Mapping is a good choice when you don’t know what the keys will be in the
dict.
Here’s an example from the post of what Mapping will catch with mypy (or other python type checkers). Please also read this post as well!
I defined a color mapping and annotated a function. Notice how the function uses the operation allowed for dicts but disallowed for Mapping instances.
# file: colors.py from typing import Mapping colors: Mapping[str, str] = { "red": "#FF0000", "pink": "#FFC0CB", "purple": "#800080", } def add_yellow(colors: Mapping[str, str]): colors["yellow"] = "#FFFF00" if __name__ == "__main__": add_yellow(colors) print(colors)
Despite wrong types, no issues in runtime.
$ python colors.py {'red': '#FF0000', 'pink': '#FFC0CB', 'purple': '#800080', 'yellow': '#FFFF00'}
To check the validity, I can use mypy, which raises an error.
$ mypy colors.py colors.py:11: error: Unsupported target for indexed assignment ("Mapping[str, str]") Found 1 error in 1 file (checked 1 source file)
As with Sequence, I would also import from collections.abc
instead of typing
for python 3.9+.
what really needs to be type hinted
With type hints, I think a lot about what data structures I create vs ones I receive and have little control over. In working as a data engineer, I try to think about data in terms of the robustness principle.
be conservative in what you send, be liberal in what you accept
All data that I send should be carefully type hinted. Whereas I think about the
data I receive in terms of an ELT model. That model tries to store the raw data
as best possible (instead of doing transformations early in the process like
with an ETL model). So I don’t want to spend too much time on how I type
hint the constantly changing data I receive. For example, I may be fine with
using dict[str, Any]
or list[Any]
for some json API responses. The most
flexible way I’ve dealt with the shape of different responses has been using
python’s pattern matching. In particular with looking out for a 200 status
response but different ways that errors are actually passed in the response
body.
I think a lot of concepts of functional programming map particularly well to data engineering. Where data can ideally be a series of one way transformations. And even more-so with async/await. I’ve personally experienced the issue (of unintended mutations outside of the function scope) in async python before I focused on immutability, which is why I’ve wanted to prevent it from happening again. There are many programming languages that have immutability be the default for function arguments for good reason. Immutability with type hints can bring simplicity and clarity to your python code while keeping it pythonic.