A Functional Imperative Mix

How does one add immutability to a language without starting a cultural revolution, when the established idioms center on imperative for loops? I have been mulling over this question and think I found a good answer.

Coexistence

Non-local mutation causes hard-to-find bugs and makes compiler analyses more difficult, but we cannot ban it outright. So what do we do?

The original plan was that classes would create mutable objects, just like in Python, and records and sum types would be deeply immutable. Unfortunately the “deeply immutable” part is problematic, because it splits the world in mutable and immutable islands. Records would not be able to contain regular lists or dictionaries, only other immutable types.

What we need is a way for mutable objects to become immutable when they are ready, so that they can be placed in records and other immutable objects. You should be able to mutate an object when populating it and freeze it when done. That would kill two birds with one stone: we could continue to use the same idioms for constructing objects, and keep using efficient data structures for the frozen form.

Persistence forces us to change

First, let me explain why I do not think persistent data structures a la Clojure is the solution. There are two reasons.

The first is that it requires adopting a different and more functional style. Take this function:

def positive(numbers):
    res = []
	
    for num in numbers:
        if num >= 0:
            res.append(num)
	
    return res

Using immutable data structures it would look like this instead:

def positive(numbers):
    res = []
	
    for num in numbers:
        if num >= 0:
            res = res + [num]
	
    return res

The difference is just one line, but it’s very significant. Instead of modifying the object, we create a new one every time we want to add something. In this example we don’t actually gain anything, but the point is that by creating a copy instead we do not risk affecting other objects that may reference it. With persistent data structures this style is relatively efficient, but still not as fast as direct mutation.

Now someone will surely remark that the second snippet is not functional either, since there is an assignment in the loop body. But assignment is unproblematic as long as it is local to a function. It is “spooky action at a distance” we seek to eliminate.

The second reason persistent data structures is not the answer is efficiency.

Locked up in a monad

Haskell is famous for being purely functional, but despite this you can actually use imperative data structures using the ST monad.

Similar to the Python example above, a local object is created, mutated through various steps, and returned when done. The monad makes sure the object does not leak out as the result is being constructed, and freezes it before it is released. Since it cannot have external references, it can be frozen directly without making a copy.

The downside is you have to accept inversion of control. The function that builds up the result is no longer driving the process, instead it gets called by the monad step by step.

Using an indirect style for this kind of thing would be very foreign to Pythonistas, so this is not the solution either.

Scoped mutation

The Javascript library Immer achieves a similar result without monads. A function called produce creates a new object to hold the result, and passes it to another function that is responsible for setting it up. Since the object was just allocated, it is safe to mutate it. It is then frozen before being passed on.

import {produce} from "immer"

const nextState = produce(baseState, draft => {
    draft[1].done = true
    draft.push({title: "Tweet about it"})
})

It is a great fit for Javascript, but since Python’s lambda syntax is restricted to single-expression bodies, it cannot be ported directly.

Freezing regions

Another idea is to partition the heap into regions, separate groups of objects. If the object is constructed in a new region, the resulting object graph will be self-contained, and we can avoid copying it when we return the frozen result.

What makes it really interesting is that a recent paper suggested adding regions to Python itself. They focus on concurrency, but it looks like it should be straightforward to support Immer-style object construction using the Region class in the paper. I imagine it would look something like this:

def positive(numbers):
    region = Region()
    region.res = []
	
    for num in numbers:
        if num >= 0:
            region.res.append(num)
	
    freeze(region)
	
    return region.res

This solution is nice and pythonic, just a bit verbose. The biggest negative is it can fail at runtime.

Freezing with types

Any viable region design for Python has to work with dynamic typing and the existing interpreter. But Newton is statically typed, which I think will enable a much better solution. If we are lucky, the construct for building up values could just be plain functions returning frozen objects.

def positive(numbers):
    let res = []
	
    for num in numbers:
        if num >= 0:
            res.append(num)
	
    return frozen(res)

For this to be safe and efficient, the type system must be able to prove that res can only escape through the return statement, that there are no other references to it.

I have not worked out the details yet, but it seems uniqueness typing and regions would fit the bill. The folks at Jane Street published an interesting paper on retrofitting OCaml with uniqueness and regions using modes, so that is probably where I should start.

Conclusion

In many situations, mutability is only needed during the construction of an object, after which it does not need to be modified again. By exploiting this phase distinction we can have deeply immutable data types without splitting the language in two.

Stefan