I believe that the set of languages which are easy to reason about formally and the set of languages in which it is easy to write correct programs intersect heavily. For this reason, I advocate total programming in purely functional programming languages, as this gives rise to a drastically simpler model for formal reasoning compared to mainstream imperative programming, as well as compared to programming in impure functional languages such as ML which do not separate evaluation from execution of effects.
To illustrate my point, I will verify the correctness of a simple program, first in an imperative setting and then in a purely functional setting. Then I will show how we can embed effectful, nonterminating, or nondeterministic computations inside a total, purely functional substrate, and discuss the advantages of this approach.
Floyd-style Imperative Verification
As an example, I will verify the program
, which iteratively adds up all natural numbers to a fixed
. The correctness condition is that the result of the computation should
be the same as the arithmetic series:
Using an extremely simple imperative language, with only six constructs1,
The program
is simply expressed as:
The are a number of ways to ascribe semantics to such a program, but
for our purposes we will use a state transition diagram in the
style of Floyd
(1967). Hoare
(1969) developed an equivalent axiomatic semantics that works
over language syntax, but I’ll use Floyd’s method as it is more visual
and more immediately intuitive. A state transition diagram
consists of a set of states
, initial states
, final states
, a set of variable assignments (stores)
, and a set
of transitions, where a transition is a quadruple
where the guard
is a predicate
and the store update
is a function
.
The diagram for the above
program is:
The state sets are
, and the set of transitions
is:
As the projection notation is somewhat difficult to read, I shall use
the semantic brackets
to denote the store-dependent function corresponding to an expression
involving store projections. For example,
is equivalent to
. Thus
becomes:
The correctness statement for an imperative program takes the form of a precondition and a postcondition, both predicates on the store. In this example, the correctness statement is:
That is, under all circumstances,
will leave
once it has terminated.
To prove correctness, each state is given an associated
assertion
. The assertions on the initial states are implied by the precondition,
in this case
, and the assertions on the final states in turn imply the
postcondition, in this case
We now must devise some assertions for
and
, such that the assertion network is inductive, that is, each
transition takes us from one assertion to the next:
Determining the assertion for
(the loop invariant) requires some creativity. Note from the
final requirement in the list that
So, one might be tempted to simply set
, thus trivially implying the desired conclusion, however this breaks
the first requirement
as
for any
. So, the assertion must be the only other sensible option: set
. This satisfies all the requirements, and easily prescribes an
assertion for
as well:
As this assertion network is inductive, I have shown partial
correctness, that is, if the program terminates, it will
have a correct result, but I have still not proven termination!
To prove termination, I must additionally choose a well-ordered set2
and associate with each state a function
, which need only be defined when
is true. For this example, I choose for
the set of pairs
, where the ordering is lexicographic comparison3,
and ranking functions as indicated in the following diagram:
Then show, for each transition
, that:
In other words, that each transition decreases the rank, according to the wellfounded order. Wellfoundedness implies there is a point, after some finite number of transitions has been taken, where the rank reaches the minimum element in the ordering – it can no longer be decreased. Thus, it is impossible to continue taking transitions beyond that point, as all transitions must decrease the rank. Therefore, the program always terminates after a finite amount of transitions.
These obligations are trivial to discharge in our example, which means the algorithm is at last proven correct. This verification was a lot of work to prove something extremely simple! I had to define a state transition system, assertion network, loop invariants, a termination order, ranking functions, and prove a bunch of irritating obligations.
The Purely Functional Approach
For the purely functional version of this algorithm, I will assume
the simply typed lambda calculus of Church (1940) equipped with an inductive type
for natural numbers
, with constructors
and
, and a catamorphism
. The typing rules are as follows:
The program
is defined as4:
The semantics of this language are given via a term rewriting system,
where
is a capture avoiding substitution of
for
inside
.
As this term rewriting system is confluent, these rules can be applied to any subexpression in any order and still end up at the same result, and as well-typed terms are strongly normalising, the rules only need be applied a finite number of times before the program halts. The proof of this is nontrivial, but it is a property of the ambient calculus, rather than a property of the program being verified. This means I do not need to prove termination, as it would be impossible to express a non-terminating program in this language56.
Now the correctness statement is, for all
:
Proving the stronger statement
instead gives better induction characteristics, and it still implies the original correctness statement.
By induction, when
, we have:
When
, we have the induction hypothesis that:
Using this, we can show our goal for
:
Thus, the program is correct, by induction on the input
. The beauty of this approach is that there is nothing in this proof
method that would be unfamiliar to anyone who studied a little discrete
mathematics and high-school algebra. The lambda calculus is just like an
algebraic expression language, and we can reason about it much like we
would reason about conventional algebraic expressions: induction,
substitution, equational reasoning, and so on.
In addition, because we are reasoning in a confluent rewriting system, the order in which subexpressions are reduced does not matter. This is not just convenient for reasoning, it also means that a compiler for this language is free to evaluate expressions in whatever order it likes - be it strict evaluation, parallel evaluation, call-by-need, or anything else. Optimisations such as loop fusion and deforestation are possible, because the compiler is free to rewrite inefficient expressions to an efficient expression, so long as it is reduction-equivalent. Code can be trivially parallelised by adding annotations.
Bringing Back Effects
These all seem like desirable properties, but there’s a sense that in order to achieve those benefits, we’ve divorced ourselves from the real world. We’ve limited ourselves to a total, deterministic, effect-free language, but a language like that is merely a space-heater - it can’t even display the results it has computed!
A common way to add some of this missing power to a lambda calculus is to add some side-effects to evaluation. This is the approach taken by impure functional languages such as ML. These languages simply add to the lambda calculus some primitives that represent assignment, and types that represent references (mutable variables)7:
To specify semantics for these operations, however, we must bring
back our old friend
, the store of variable assignments.
This in turn forces us to be more prescriptive about the order in which expressions are evaluated, making the rewrite system into an operational semantics:
Without this explicit ordering, we could derive two different results from the same computation, simply by evaluating assignments in a different order. Now that we’ve added an ordering, however, we lose the advantages of the pure model. We have prescribed a specific evaluation model (in this case strict evaluation) on our expressions, so we can no longer reduce arbitrary subexpressions while reasoning, optimisations such as fusion are much more difficult to perform soundly, the user has no control over evaluation (for example for parallelism or laziness), and to properly reason about our programs we are forced to use cumbersome state-assertion reasoning frameworks, just like in imperative languages.
An ideal language supports both purely functional programs where that’s appropriate, and impure imperative programs where that’s appropriate, with sound reasoning about either style as necessary. With ML’s approach, the imperative reasoning infects the purely functional world, and complicates the programmer’s mental model.
A better approach is to embed an imperative language inside the lambda calculus as a separate type. As this embedding requires polymorphism, we’ll switch from the simply typed lambda calculus to a variation on the Girard-Reynolds polymorphic lambda calculus (Reynolds 1974), with data types and syntax loosely inspired by that of Isabelle/HOL (Nipkow et al. 2002).
The type
, denotes an embedded imperative computation8
which, when executed, results in a value of type
. We use the infix operator
for
, and
as a shorthand for
.
Then, an imperative
can be expressed directly in this embedded language, notably doing away
with the reference
:
Note that the reduction rules for the underlying lambda calculus are
unchanged - The embedded
computation is not executed when it is evaluated! Instead, we
have a pure computation, which returns an impure
computation in an embedded language. This impure computation is then
interpreted by an entirely separate set of semantics, defined
operationally as the relation
.
statements only progress when the condition is true, returning the given
value.
statements execute the first action until it returns a result, and then
use that result to determine the next action:
Nondeterministic choice is modelled by having two rules, both of which are equally applicable to a given state:
Loops are given semantics by simple expansion to nondeterministic choice:
And
and
behave exactly as one might expect:
Note that these semantics say nothing about evaluation of
expressions, only the execution of actions. The top-level
interpretation for this programming language simply interleaves
the execution of actions and the evaluation of expressions. This is
because expression evaluation is purely functional - the embedded
language doesn’t care about the order of expression evaluation, because
it doesn’t change the result of the program! This means we have kept all
the benefits of total, deterministic, purely functional programming,
while still being able to express partial, nondeterministic, imperative
programs using the embedded language when necessary. We have the best of
both worlds!
The drawback is that for imperative programs that are total,
terminating, deterministic and otherwise easy to reason about, such as
, the reasoning tools available are for the more general
language - we’d have to prove termination, totality, determinism and so
on for each program we write. We have all the reasoning
benefits of purely functional programming for purely functional
programs, but the moment a single mutable variable sneaks its way into a
program, it has to go back to the
sin-bin. The solution is in another embedded language, but a
slightly less unwieldy one – terminating and deterministic:
The semantics for
are given simply as a mapping into
:
Of course, it is necessary to prove that the output of
is indeed a deterministic
computation that always terminates, but once this is proven, there is no
longer any need to prove these properties for each of our programs.
Simply by expressing
in
, we have shown it to terminate:
One could also imagine using similarly restricted languages to ensure, by construction, that more domain-specific properties are satisfied – The use of restricted languages to generalise verification of such properties from one program to many programs is precisely the method of the Trustworthy File Systems project I work on at NICTA (Keller et al. 2013).
This doesn’t just give us the best of both worlds, it lets us pick one of an infinite variety of worlds to live in – the world of terminating imperative programs, the world of purely functional programs, the world of nondeterministic programs, or even the world of STM transactions, where access to IO must be restricted for the transactional abstraction to be safe (Harris et al. 2005). We can pick whatever sublanguage we want, with whatever reasoning characteristics we want, and thus easily establish properties about our programs almost for free – they come from the language used, not the program definition. In addition, as Wadler (1995) notes, these languages all have the same monadic structure, which allows high-level operations to be expressed in terms of the monad abstraction rather than in terms of any particular language.
All of this is only possible because we have a pure, total substrate
within which these languages are embedded. I could define a
language or an
language but it would be useless if the ambient lambda calculus was not
so restricted – there is no way to prevent IO during an STM transaction
or a diverging expression from being invoked from inside a
, when evaluation could have such an effect.
Lastly, it’s worth noting that the Haskell programming language differs from the lambda calculus used here in several important respects. Chiefly, it allows partiality and general recursion, making the language turing-complete. This still falls under the definition of “purely functional” – indeed, its non-strict semantics make the sort of reasoning I used here relatively harmless, even in the presence of “bottom” values arising from nontermination or partiality (see Danielsson et al. 2006). Turner (2004) has advocated strongly for language-enforced totality, but my impression is that total programming is still a fledgling idea, with some distance to cross before it is practical.
References
The variables
and
stand for arithmetic and boolean expressions, respectively.↩︎
That is, a set equipped with a greater-than relation such that each descending chain
is finite.↩︎
True is considered greater than False.↩︎
With standard implementation of addition:
↩︎
Even if we added a fixpoint loop combinator as a primitive, making the language turing-complete, we could easily prove termination of
by noting that it never uses it.↩︎
The imperative language could have used a bounded loop primitive instead of the unbounded one, and also been automatically terminating, but this is not reflective of idiomatic imperative programming, where the same looping construct is used for both infinite and terminating loops. We will examine terminating imperative languages later in the article.↩︎
Destructive update is perhaps the thorniest effect to deal with, so handling for other effects like console output are left as an exercise.↩︎
It just so happens that this embedded language forms a monad, but monads are not the salient point here.↩︎