💎 Pure vs impure functions 💎

This week in functional programming for data scientists

May 05, 2020

That’s me learning about pure functions.

I’ve been dabbling in and learning functional programming for a while now. What is functional programming one might ask? This is how wikipedia describes it:

Functional programming is a programming paradigm where programs are constructed by applying and composing functions. It is a declarative programming paradigm in which function definitions are trees of expressions that each return a value, rather than a sequence of imperative statements which change the state of the program or world.

In other words, instead of writing things that continuously change the state of the program, we write a series of functions that we can exactly stack onto each other like lego blocks and pass the data from the one end to the other end.

I learned that most of the functions that I’ve written so far are not actually functions. They are something like fake functions or as I like to call them, pseudo-functions.

Let’s find out what separates a real function from a pseudo-function.

Definition of a function

What separates a real function from a pseudo-function? We turn to Wikipedia for the answer:

In mathematics, a function is a binary relation over two sets that associates to every element of the first set exactly one element of the second set. Typical examples are functions from integers to integers or from the real numbers to real numbers.

In other words, a function is a thing that maps one set of elements to another set of elements. An important part of this definition is that it is a one-to-one relationship: one input matches to one output.

Personally, I like to visualise functions more as machines that do something. Something goes in, something comes out.

An example of a pure function

What does a pure function look like? This is an example of an innocent pure function that adds 1 to something.

This function is what we call a pure function.

A pure function is a function that:

Always returns the same output given the same input
Produces no side effects

We can see that this function is indeed pure. Given the same input, say 3, it always returns the same thing, namely 4. This satisfies the first condition for a pure function. The function also produces no side effects, it does not change the state of the program in any way. Hence, it is pure.

Now let’s consider an example of an impure function to see what the difference is.

An example of an impure function

Let’s now consider the following impure function that adds a number to our argument.

This function is most definitely not pure.

It does not satisfy the first condition of purity. This function does not always return the same output given the same input. For the same input argument (x) we can get different outputs (green squares) based on the value of the value of the global parameter (c). Oh no!

How could this happen and why does this even matter?

Sources of impurity

Where does this impurity come from? Excluding whatever goes on in the machine, there are two sources of impurity: stuff we put in (1) and stuff that comes out of it (2).

The first source of impurity (1) comes from the fact that the function uses variables that are not part of the input parameters. This breaks the one-to-one relationship because you can get different function results for the same input parameters based on the variables that are not part of the input parameters!

The second source of impurity (2) comes from changing the state of the program. Your pure function should never change the state of the program, it should just return something. More specifically, it should always return the same thing given the same input.

Why does this matter?

If you look at a single function in isolation this does not seem like a big problem. But you always have to realise that we as data scientists and software engineers are working on systems of things. Big crazy difficult system of unwieldy complexity comprising of simpler little components.

This is what happens if we stack up several impure functions on top of each other. Check out what is happening with the system state.

As you can imagine, with even larger systems with hundreds or thousands of functions it becomes impossible to keep track of the system state.

In a system with impure functions, state becomes extremely hard to manage. This results in unwieldy code and bugs that are hard to track down.

Compare this with its functional alternative where we stack up functions that map exactly one input to one output and produce no side effects.

WOAH!

You might think to yourself “wow that’s amazing, how do I get in on this?!”

Getting started with functional programming

Knowing all this stuff about pure and impure functions is nice and all, but how do we apply all of this?

The best way to start with functional programming is to start writing pure functions in your own language.

I admit, it is hard to get started with functional programming but there is one thing you can do to get started: stop writing impure functions and start writing pure functions.

Up next

For my next post, I have no idea what I will write. It’s probably something on functional programming or marketing or productivity.

Did you like this post and want to see more? Subscribe now, it’s free!

The Data Scientist

Discussion about this post