I believe that the set of languages which are easy to reason about
formally and the set of languages in which it is easy to write correct
programs intersect heavily. For this reason, I advocate *total
programming* in *purely functional* programming languages, as
this gives rise to a drastically simpler model for formal reasoning
compared to mainstream imperative programming, as well as compared to
programming in impure functional languages such as ML which do not
separate evaluation from execution of effects.

To illustrate my point, I will verify the correctness of a simple program, first in an imperative setting and then in a purely functional setting. Then I will show how we can embed effectful, nonterminating, or nondeterministic computations inside a total, purely functional substrate, and discuss the advantages of this approach.

## Floyd-style Imperative Verification

As an example, I will verify the program , which iteratively adds up all natural numbers to a fixed . The correctness condition is that the result of the computation should be the same as the arithmetic series:

Using an extremely simple imperative language, with only six
constructs^{1},

The program is simply expressed as:

The are a number of ways to ascribe semantics to such a program, but
for our purposes we will use a *state transition diagram* in the
style of Floyd
(1967). Hoare
(1969) developed an equivalent axiomatic semantics that works
over language syntax, but I’ll use Floyd’s method as it is more visual
and more immediately intuitive. A *state transition diagram*
consists of a set of *states*
, *initial* states
, *final* states
, a set of variable assignments (*stores*)
, and a set
of *transitions*, where a transition is a quadruple
where the guard
is a predicate
and the store update
is a function
.

The diagram for the above program is:

The state sets are , and the set of transitions is:

As the projection notation is somewhat difficult to read, I shall use the semantic brackets to denote the store-dependent function corresponding to an expression involving store projections. For example, is equivalent to . Thus becomes:

The correctness statement for an imperative program takes the form of
a *precondition* and a *postcondition*, both predicates on
the store. In this example, the correctness statement is:

That is, under all circumstances, will leave once it has terminated.

To prove correctness, each state is given an associated
*assertion*
. The assertions on the initial states are implied by the precondition,
in this case
, and the assertions on the final states in turn imply the
postcondition, in this case

We now must devise some assertions for
and
, such that the *assertion network is inductive*, that is, each
transition takes us from one assertion to the next:

Determining the assertion for
(the *loop invariant*) requires some creativity. Note from the
final requirement in the list that

So, one might be tempted to simply set , thus trivially implying the desired conclusion, however this breaks the first requirement

as for any . So, the assertion must be the only other sensible option: set . This satisfies all the requirements, and easily prescribes an assertion for as well:

As this assertion network is inductive, I have shown *partial
correctness*, that is, *if the program terminates*, it will
have a correct result, *but I have still not proven termination!*
To prove termination, I must additionally choose a well-ordered set^{2}
and associate with each state a function
, which need only be defined when
is true. For this example, I choose for
the set of pairs
, where the ordering is lexicographic comparison^{3},
and ranking functions as indicated in the following diagram:

Then show, for each transition , that:

In other words, that each transition decreases the rank, according to the wellfounded order. Wellfoundedness implies there is a point, after some finite number of transitions has been taken, where the rank reaches the minimum element in the ordering – it can no longer be decreased. Thus, it is impossible to continue taking transitions beyond that point, as all transitions must decrease the rank. Therefore, the program always terminates after a finite amount of transitions.

These obligations are trivial to discharge in our example, which
means the algorithm is at last proven correct. This verification was
*a lot of work* to prove something *extremely simple*! I
had to define a state transition system, assertion network, loop
invariants, a termination order, ranking functions, and prove a bunch of
irritating obligations.

## The Purely Functional Approach

For the purely functional version of this algorithm, I will assume the simply typed lambda calculus of Church (1940) equipped with an inductive type for natural numbers , with constructors and , and a catamorphism . The typing rules are as follows:

The program
is defined as^{4}:

The semantics of this language are given via a term rewriting system, where is a capture avoiding substitution of for inside .

As this term rewriting system is *confluent*, these rules can
be applied to any subexpression in any order and still end up at the
same result, and as well-typed terms are *strongly normalising*,
the rules only need be applied a finite number of times before the
program halts. The proof of this is nontrivial, but it is a property of
the ambient calculus, rather than a property of the program being
verified. This means I do not need to prove termination, as it would be
impossible to express a non-terminating program in this language^{5}^{6}.

Now the correctness statement is, for all :

Proving the stronger statement

instead gives better induction characteristics, and it still implies the original correctness statement.

By induction, when , we have:

When , we have the induction hypothesis that:

Using this, we can show our goal for :

Thus, the program is correct, by induction on the input . The beauty of this approach is that there is nothing in this proof method that would be unfamiliar to anyone who studied a little discrete mathematics and high-school algebra. The lambda calculus is just like an algebraic expression language, and we can reason about it much like we would reason about conventional algebraic expressions: induction, substitution, equational reasoning, and so on.

In addition, because we are reasoning in a *confluent*
rewriting system, the order in which subexpressions are reduced *does
not matter*. This is not just convenient for reasoning, it also
means that a compiler for this language is free to evaluate expressions
in whatever order it likes - be it strict evaluation, parallel
evaluation, call-by-need, or anything else. Optimisations such as loop
fusion and deforestation are possible, because the compiler is free to
rewrite inefficient expressions to an efficient expression, so long as
it is reduction-equivalent. Code can be trivially parallelised by adding
annotations.

## Bringing Back Effects

These all seem like desirable properties, but there’s a sense that in order to achieve those benefits, we’ve divorced ourselves from the real world. We’ve limited ourselves to a total, deterministic, effect-free language, but a language like that is merely a space-heater - it can’t even display the results it has computed!

A common way to add some of this missing power to a lambda calculus
is to add some side-effects to evaluation. This is the approach taken by
impure functional languages such as ML. These languages simply add to
the lambda calculus some primitives that represent assignment, and types
that represent references (mutable variables)^{7}:

To specify semantics for these operations, however, we must bring back our old friend , the store of variable assignments.

This in turn forces us to be more prescriptive about the order in
which expressions are evaluated, making the rewrite system into an
*operational semantics*:

Without this explicit ordering, we could derive two different results
from the same computation, simply by evaluating assignments in a
different order. Now that we’ve added an ordering, however, *we lose
the advantages of the pure model*. We have prescribed a specific
evaluation model (in this case strict evaluation) on our expressions, so
we can no longer reduce arbitrary subexpressions while reasoning,
optimisations such as fusion are much more difficult to perform soundly,
the user has no control over evaluation (for example for parallelism or
laziness), and to properly reason about our programs we are forced to
use cumbersome state-assertion reasoning frameworks, just like in
imperative languages.

An ideal language supports *both* purely functional programs
where that’s appropriate, *and* impure imperative programs where
that’s appropriate, with sound reasoning about either style as
necessary. With ML’s approach, the imperative reasoning *infects*
the purely functional world, and complicates the programmer’s mental
model.

A better approach is to *embed* an imperative language inside
the lambda calculus as a separate type. As this embedding requires
polymorphism, we’ll switch from the simply typed lambda calculus to a
variation on the Girard-Reynolds polymorphic lambda calculus (Reynolds 1974), with data
types and syntax loosely inspired by that of Isabelle/HOL (Nipkow *et al.*
2002).

The type
, denotes an *embedded imperative computation*^{8}
which, when *executed*, results in a value of type
. We use the infix operator
for
, and
as a shorthand for
.

Then, an imperative can be expressed directly in this embedded language, notably doing away with the reference :

Note that the reduction rules for the underlying lambda calculus are
*unchanged* - The embedded
computation is *not* executed when it is evaluated! Instead, we
have a *pure* computation, which returns an *impure*
computation in an embedded language. This impure computation is then
interpreted by an entirely *separate* set of semantics, defined
operationally as the relation
.

statements only progress when the condition is true, returning the given value.

statements execute the first action until it returns a result, and then use that result to determine the next action:

Nondeterministic choice is modelled by having two rules, both of which are equally applicable to a given state:

Loops are given semantics by simple expansion to nondeterministic choice:

And and behave exactly as one might expect:

Note that these semantics say nothing about *evaluation of
expressions*, only the *execution of actions*. The top-level
interpretation for this programming language simply *interleaves*
the execution of actions and the evaluation of expressions. This is
because expression evaluation is purely functional - the embedded
language doesn’t care about the order of expression evaluation, because
it doesn’t change the result of the program! This means we have kept all
the benefits of total, deterministic, purely functional programming,
while still being able to express partial, nondeterministic, imperative
programs using the embedded language when necessary. We have the best of
both worlds!

The drawback is that for imperative programs that *are* total,
terminating, deterministic and otherwise easy to reason about, such as
, the reasoning tools available are for the more general
language - we’d have to prove termination, totality, determinism and so
on *for each program we write*. We have all the reasoning
benefits of purely functional programming for purely functional
programs, but the moment a single mutable variable sneaks its way into a
program, it has to go back to the
sin-bin. The solution is in *another* embedded language, but a
slightly less unwieldy one – terminating and deterministic:

The semantics for are given simply as a mapping into :

Of course, it is necessary to prove that the output of
is indeed a deterministic
computation that always terminates, but once this is proven, there is no
longer any need to prove these properties for each of our programs.
Simply by *expressing*
in
, we have shown it to terminate:

One could also imagine using similarly restricted languages to
ensure, by construction, that more domain-specific properties are
satisfied – The use of restricted languages to generalise verification
of such properties from *one program* to *many programs*
is precisely the method of the Trustworthy File Systems project I work
on at NICTA (Keller
*et al.* 2013).

This doesn’t just give us the best of both worlds, it lets us pick
one of an infinite variety of worlds to live in – the world of
terminating imperative programs, the world of purely functional
programs, the world of nondeterministic programs, or even the world of
STM
transactions, where access to IO must be restricted for the
transactional abstraction to be safe (Harris *et al.* 2005). We can pick
whatever sublanguage we want, with whatever reasoning characteristics we
want, and thus easily establish properties about our programs almost for
free – they come from the language used, not the program definition. In
addition, as Wadler
(1995) notes, these languages all have the same monadic
structure, which allows high-level operations to be expressed in terms
of the monad abstraction rather than in terms of any particular
language.

All of this is only possible because we have a pure, total substrate within which these languages are embedded. I could define a language or an language but it would be useless if the ambient lambda calculus was not so restricted – there is no way to prevent IO during an STM transaction or a diverging expression from being invoked from inside a , when evaluation could have such an effect.

Lastly, it’s worth noting that the Haskell programming language
differs from the lambda calculus used here in several important
respects. Chiefly, it allows partiality and general recursion, making
the language turing-complete. This still falls under the definition of
“purely functional” – indeed, its non-strict semantics make the sort of
reasoning I used here relatively harmless, even in the presence of
“bottom” values arising from nontermination or partiality (see Danielsson *et
al.* 2006). Turner
(2004) has advocated strongly for language-enforced totality, but
my impression is that total programming is still a fledgling idea, with
some distance to cross before it is practical.

#### References

*The Journal of Symbolic Logic*,

**5**(2), 56–68.

*Conference record of the 33rd ACM SIGPLAN SIGACT symposium on principles of programming languages*, New York, NY, USA: ACM, pp. 206–217.

*Mathematical Aspects of Computer Science, Proceedings of Symposia in Applied Mathematics 19*, Providence: American Mathematical Society, pp. 19–32.

*Proceedings of the tenth ACM SIGPLAN symposium on principles and practice of parallel programming*, New York, NY, USA: ACM, pp. 48–60.

*Communications of the ACM*,

**12**(10), 576–580.

*Programming languages and operating systems (PLOS)*.

*Isabelle/HOL: A Proof Assistant for Higher-Order Logic*, Vol. 2283, Springer.

*Programming symposium, proceedings colloque sur la programmation*, London, UK, UK: Springer-Verlag, pp. 408–423.

*Journal of Universal Computer Science*,

**10**, 187–209.

*Advanced functional programming summer school*, London, UK, UK: Springer-Verlag, pp. 24–52.

The variables and stand for arithmetic and boolean expressions, respectively.↩︎

That is, a set equipped with a greater-than relation such that each descending chain is finite.↩︎

True is considered greater than False.↩︎

With standard implementation of addition: ↩︎

Even if we added a fixpoint loop combinator as a primitive, making the language turing-complete, we could easily prove termination of by noting that it never uses it.↩︎

The imperative language could have used a

*bounded*loop primitive instead of the unbounded one, and also been automatically terminating, but this is not reflective of idiomatic imperative programming, where the same looping construct is used for both infinite and terminating loops. We will examine terminating imperative languages later in the article.↩︎Destructive update is perhaps the thorniest effect to deal with, so handling for other effects like console output are left as an exercise.↩︎

It just so happens that this embedded language forms a monad, but monads are not the salient point here.↩︎