Four layers of Python

One way to think about Python is that it’s built up from progressively larger sublanguages, like the layers of an onion. Turns out we can peel off the top layer and still have something useful, maybe even more useful for some purposes.

Four layers

Viewed this way, Python has four layers wrapped around a core of built-in primitives:

  • The expression layer
  • The statement layer
  • The definition layer
  • The library layer

The expression layer consists of literals, arithmetic, function calls, comprehensions, and so on. Everyone uses this subset.

The statement layer adds assignments and control flow, such as conditionals, loops, and exception handling.

The definition layer lets you define new functions and classes, and overload operators.

The library layer is mostly used for writing packages of reusable functionality. It lets you do metaprogramming, load foreign functions, and customise how classes are created. I would put reflection in this category too.

Assignments are a bit special. I would put assignments that bind variables in layer 1, and those that modify an existing variable in the statement layer. But assignments can also add methods to classes and be used for other very dynamic behavior. I would classify this as layer four.

Each layer is useful

Most Python programmers only use the first three layers, and it is interesting to note that they are useful on their own, from the inside out.

For instance, the expression subset could be used as a configuration language, or as the formula language of a spreadsheet. You don’t need the other layers for this purpose.

Add the statement layer and you can do simple scripting, as an alternative to shell scripts or batch files. You can stay in the inner two layers and still do meaningful work using only core functions and data types.

The library layer

Definitions from outer layers appear in inner layers as identifiers, whose implementation can be replaced as long as they are compatible. The expression layer doesn’t care if a function was defined in the third layer, generated by a decorator in the fourth layer, or was built into the core.

This goes beyond just functions. In most cases it does not matter if decorators are a load-time construct, like today, or compile-time macros.

This means we may be able to replace or peel off the outer layer entirely with limited impact on the layers below. The value of existing skills and source code could be preserved.

This matters, because the library layer is where most of the problems with Python appear, from a compiler writer’s point of view.

Not all languages

You could argue that all languages are layered like this, but I don’t think that’s true. The expression subset of C, for instance, is too bare-bones to be useful on its own. You need collection literals and comprehensions to be able to do more than just arithmetic. There must be a written representation for composite values so that they can be typed in and displayed.

The various subsets could also be in competition with each other, instead of layered. Scala comes to mind.

Python without definitions

Leaving out the top layers has more benefits than just limiting the scope of work. Not giving the user any means of extension lets the compiler make stronger assumptions and rule out possibilities. For instance, field locations would be a simple base + offset calculation if fields could not be added at runtime.

Prior art

There have been many projects that re-implement the core, for instance IronPython and Nuitka. The best example of peeling off a layer that I am aware of is Starlark, a build language with Python syntax. There is also GDScript, but it seems to differ from Python much more, and does so in every layer.

Conclusion

The inner layers could be wrapped around a new core and the most dynamic means of extension omitted, and the language would still be useful and familiar. This can be used both to shorten the time to market, and to become a viable option in new domains.

Stefan