# Imperative Reasoning is Hard

I believe that the set of languages which are easy to reason about formally and the set of languages in which it is easy to write correct programs intersect heavily. For this reason, I advocate *total programming* in *purely functional* programming languages, as this gives rise to a drastically simpler model for formal reasoning compared to mainstream imperative programming, as well as compared to programming in impure functional languages such as ML which do not separate evaluation from execution of effects.

To illustrate my point, I will verify the correctness of a simple program, first in an imperative setting and then in a purely functional setting. Then I will show how we can embed effectful, nonterminating, or nondeterministic computations inside a total, purely functional substrate, and discuss the advantages of this approach.

## Floyd-style Imperative Verification

As an example, I will verify the program , which iteratively adds up all natural numbers to a fixed . The correctness condition is that the result of the computation should be the same as the arithmetic series:

Using an extremely simple imperative language, with only six constructs^{1},

The program is simply expressed as:

The are a number of ways to ascribe semantics to such a program, but for our purposes we will use a *state transition diagram* in the style of Floyd [1967]. Hoare [1969] developed an equivalent axiomatic semantics that works over language syntax, but I’ll use Floyd’s method as it is more visual and more immediately intuitive. A *state transition diagram* consists of a set of *states* , *initial* states , *final* states , a set of variable assignments (*stores*) , and a set of *transitions*, where a transition is a quadruple where the guard is a predicate and the store update is a function .

The diagram for the above program is:

The state sets are , and the set of transitions is:

As the projection notation is somewhat difficult to read, I shall use the semantic brackets to denote the store-dependent function corresponding to an expression involving store projections. For example, is equivalent to . Thus becomes:

The correctness statement for an imperative program takes the form of a *precondition* and a *postcondition*, both predicates on the store. In this example, the correctness statement is:

That is, under all circumstances, will leave once it has terminated.

To prove correctness, each state is given an associated *assertion* . The assertions on the initial states are implied by the precondition, in this case , and the assertions on the final states in turn imply the postcondition, in this case

We now must devise some assertions for and , such that the *assertion network is inductive*, that is, each transition takes us from one assertion to the next:

Determining the assertion for (the *loop invariant*) requires some creativity. Note from the final requirement in the list that

So, one might be tempted to simply set , thus trivially implying the desired conclusion, however this breaks the first requirement

as $1 $ for any . So, the assertion must be the only other sensible option: set . This satisfies all the requirements, and easily prescribes an assertion for as well:

As this assertion network is inductive, I have shown *partial correctness*, that is, *if the program terminates*, it will have a correct result, *but I have still not proven termination!* To prove termination, I must additionally choose a well-ordered set^{2} and associate with each state a function , which need only be defined when is true. For this example, I choose for the set of pairs , where the ordering is lexicographic comparison^{3}, and ranking functions as indicated in the following diagram:

Then show, for each transition , that:

In other words, that each transition decreases the rank, according to the wellfounded order. Wellfoundedness implies there is a point, after some finite number of transitions has been taken, where the rank reaches the minimum element in the ordering — it can no longer be decreased. Thus, it is impossible to continue taking transitions beyond that point, as all transitions must decrease the rank. Therefore, the program always terminates after a finite amount of transitions.

These obligations are trivial to discharge in our example, which means the algorithm is at last proven correct. This verification was *a lot of work* to prove something *extremely simple*! I had to define a state transition system, assertion network, loop invariants, a termination order, ranking functions, and prove a bunch of irritating obligations.

## The Purely Functional Approach

For the purely functional version of this algorithm, I will assume the simply typed lambda calculus of Church [1940] equipped with an inductive type for natural numbers , with constructors and , and a catamorphism . The typing rules are as follows:

The program is defined as^{4}:

The semantics of this language are given via a term rewriting system, where is a capture avoiding substitution of for inside .

As this term rewriting system is *confluent*, these rules can be applied to any subexpression in any order and still end up at the same result, and as well-typed terms are *strongly normalising*, the rules only need be applied a finite number of times before the program halts. The proof of this is nontrivial, but it is a property of the ambient calculus, rather than a property of the program being verified. This means I do not need to prove termination, as it would be impossible to express a non-terminating program in this language^{5}^{6}.

Now the correctness statement is, for all :

Proving the stronger statement

instead gives better induction characteristics, and it still implies the original correctness statement.

By induction, when , we have:

When , we have the induction hypothesis that:

Using this, we can show our goal for :

Thus, the program is correct, by induction on the input . The beauty of this approach is that there is nothing in this proof method that would be unfamiliar to anyone who studied a little discrete mathematics and high-school algebra. The lambda calculus is just like an algebraic expression language, and we can reason about it much like we would reason about conventional algebraic expressions: induction, substitution, equational reasoning, and so on.

In addition, because we are reasoning in a *confluent* rewriting system, the order in which subexpressions are reduced *does not matter*. This is not just convenient for reasoning, it also means that a compiler for this language is free to evaluate expressions in whatever order it likes - be it strict evaluation, parallel evaluation, call-by-need, or anything else. Optimisations such as loop fusion and deforestation are possible, because the compiler is free to rewrite inefficient expressions to an efficient expression, so long as it is reduction-equivalent. Code can be trivially parallelised by adding annotations.

## Bringing Back Effects

These all seem like desirable properties, but there’s a sense that in order to achieve those benefits, we’ve divorced ourselves from the real world. We’ve limited ourselves to a total, deterministic, effect-free language, but a language like that is merely a space-heater - it can’t even display the results it has computed!

A common way to add some of this missing power to a lambda calculus is to add some side-effects to evaluation. This is the approach taken by impure functional languages such as ML. These languages simply add to the lambda calculus some primitives that represent assignment, and types that represent references (mutable variables)^{7}:

To specify semantics for these operations, however, we must bring back our old friend , the store of variable assignments.

This in turn forces us to be more prescriptive about the order in which expressions are evaluated, making the rewrite system into an *operational semantics*:

Without this explicit ordering, we could derive two different results from the same computation, simply by evaluating assignments in a different order. Now that we’ve added an ordering, however, *we lose the advantages of the pure model*. We have prescribed a specific evaluation model (in this case strict evaluation) on our expressions, so we can no longer reduce arbitrary subexpressions while reasoning, optimisations such as fusion are much more difficult to perform soundly, the user has no control over evaluation (for example for parallelism or laziness), and to properly reason about our programs we are forced to use cumbersome state-assertion reasoning frameworks, just like in imperative languages.

An ideal language supports *both* purely functional programs where that’s appropriate, *and* impure imperative programs where that’s appropriate, with sound reasoning about either style as necessary. With ML’s approach, the imperative reasoning *infects* the purely functional world, and complicates the programmer’s mental model.

A better approach is to *embed* an imperative language inside the lambda calculus as a separate type. As this embedding requires polymorphism, we’ll switch from the simply typed lambda calculus to a variation on the Girard-Reynolds polymorphic lambda calculus [Reynolds 1974], with data types and syntax loosely inspired by that of Isabelle/HOL [Nipkow et al. 2002].

The type , denotes an *embedded imperative computation* which, when *executed*, results in a value of type . We use the infix operator for , and as a shorthand for .

Then, an imperative can be expressed directly in this embedded language, notably doing away with the reference :

Note that the reduction rules for the underlying lambda calculus are *unchanged* - The embedded computation is *not* executed when it is evaluated! Instead, we have a *pure* computation, which returns an *impure* computation in an embedded language. This impure computation is then interpreted by an entirely *separate* set of semantics, defined operationally as the relation .

statements only progress when the condition is true, returning the given value.

statements execute the first action until it returns a result, and then use that result to determine the next action:

Nondeterministic choice is modelled by having two rules, both of which are equally applicable to a given state:

Loops are given semantics by simple expansion to nondeterministic choice:

And and behave exactly as one might expect:

Note that these semantics say nothing about *evaluation of expressions*, only the *execution of actions*. The top-level interpretation for this programming language simply *interleaves* the execution of actions and the evaluation of expressions. This is because expression evaluation is purely functional - the embedded language doesn’t care about the order of expression evaluation, because it doesn’t change the result of the program! This means we have kept all the benefits of total, deterministic, purely functional programming, while still being able to express partial, nondeterministic, imperative programs using the embedded language when necessary. We have the best of both worlds!

The drawback is that for imperative programs that *are* total, terminating, deterministic and otherwise easy to reason about, such as , the reasoning tools available are for the more general language - we’d have to prove termination, totality, determinism and so on *for each program we write*. We have all the reasoning benefits of purely functional programming for purely functional programs, but the moment a single mutable variable sneaks its way into a program, it has to go back to the sin-bin. The solution is in *another* embedded language, but a slightly less unwieldy one — terminating and deterministic:

The semantics for are given simply as a mapping into :

Of course, it is necessary to prove that the output of is indeed a deterministic computation that always terminates, but once this is proven, there is no longer any need to prove these properties for each of our programs. Simply by *expressing* in , we have shown it to terminate:

One could also imagine using similarly restricted languages to ensure, by construction, that more domain-specific properties are satisfied — The use of restricted languages to generalise verification of such properties from *one program* to *many programs* is precisely the method of the Trustworthy File Systems project I work on at NICTA [Keller et al. 2013].

This doesn’t just give us the best of both worlds, it lets us pick one of an infinite variety of worlds to live in — the world of terminating imperative programs, the world of purely functional programs, the world of nondeterministic programs, or even the world of STM transactions, where access to IO must be restricted for the transactional abstraction to be safe [Harris et al. 2005]. We can pick whatever sublanguage we want, with whatever reasoning characteristics we want, and thus easily establish properties about our programs almost for free — they come from the language used, not the program definition. In addition, as Wadler [1995] notes, these languages all have the same monadic structure, which allows high-level operations to be expressed in terms of the monad abstraction rather than in terms of any particular language.

All of this is only possible because we have a pure, total substrate within which these languages are embedded. I could define a language or an language but it would be useless if the ambient lambda calculus was not so restricted — there is no way to prevent IO during an STM transaction or a diverging expression from being invoked from inside a , when evaluation could have such an effect.

Lastly, it’s worth noting that the Haskell programming language differs from the lambda calculus used here in several important respects. Chiefly, it allows partiality and general recursion, making the language turing-complete. This still falls under the definition of “purely functional” — indeed, its non-strict semantics make the sort of reasoning I used here relatively harmless, even in the presence of “bottom” values arising from nontermination or partiality [see Danielsson et al. 2006]. Turner [2004] has advocated strongly for language-enforced totality, but my impression is that total programming is still a fledgling idea, with some distance to cross before it is practical.

#### References

Church, A., 1940. A Formulation of the Simple Theory of Types. *The Journal of Symbolic Logic*, 5(2), pp.56–68.

Danielsson, N.A., Hughes, J., Jansson, P. and Gibbons, J., 2006. Fast and loose reasoning is morally correct. In *Conference Record of the 33rd ACM SIGPLAN SIGACT Symposium on Principles of Programming Languages*. POPL ’06. New York, NY, USA: ACM, pp. 206–217.

Floyd, R.W., 1967. Assigning meanings to programs. In J. T. Schwartz, ed. *Mathematical Aspects of Computer Science, Proceedings of Symposia in Applied Mathematics 19*. Providence: American Mathematical Society, pp. 19–32.

Harris, T., Marlow, S., Peyton-Jones, S. and Herlihy, M., 2005. Composable memory transactions. In *Proceedings of the Tenth ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming*. PPoPP ’05. New York, NY, USA: ACM, pp. 48–60.

Hoare, C.A.R., 1969. An axiomatic basis for computer programming. *Communications of the ACM*, 12(10), pp.576–580.

Keller, G., Murray, T., Amani, S., O’Connor, L., Chen, Z., Ryzhyk, L., Klein, G. and Heiser, G., 2013. File Systems Deserve Verification Too! In *Programming Languages and Operating Systems (PLOS)*.

Nipkow, T., Paulson, L.C. and Wenzel, M., 2002. *Isabelle/HOL: A Proof Assistant for Higher-Order Logic*, Springer.

Reynolds, J.C., 1974. Towards a theory of type structure. In *Programming Symposium, Proceedings Colloque Sur La Programmation*. London, UK, UK: Springer-Verlag, pp. 408–423.

Turner, D., 2004. Total functional programming. *Journal of Universal Computer Science*, 10, pp.187–209.

Wadler, P., 1995. Monads for functional programming. In *Advanced Functional Programming Summer School*. London, UK, UK: Springer-Verlag, pp. 24–52.

The variables and stand for arithmetic and boolean expressions, respectively.↩

That is, a set equipped with a greater-than relation such that each descending chain is finite.↩

True is considered greater than False.↩

With standard implementation of addition: ↩

Even if we added a fixpoint loop combinator as a primitive, making the language turing-complete, we could easily prove termination of by noting that it never uses it.↩

The imperative language could have used a

*bounded*loop primitive instead of the unbounded one, and also been automatically terminating, but this is not reflective of idiomatic imperative programming, where the same looping construct is used for both infinite and terminating loops. We will examine terminating imperative languages later in the article.↩Destructive update is perhaps the thorniest effect to deal with, so handling for other effects like console output are left as an exercise.↩