module 2015-08-23-verified-compiler where open import Data.Fin hiding (_+_; _≟_) renaming (#_ to i) open import Data.Nat hiding (_≟_) open import Data.Vec hiding (_>>=_; _⊛_)

Recently my research has been centered around the development of a self-certifying compiler for a functional language with linear types called Cogent (see O’Connor *et al.* (2016)). The compiler works by emitting, along with generated low-level code, a proof in Isabelle/HOL (see Nipkow *et al.* (2002)) that the generated code is a refinement of the original program, expressed via a simple functional semantics in HOL.

As dependent types unify for us the language of code and proof, my current endeavour has been to explore how such a compiler would look if it were implemented and verified in a dependently typed programming language instead. In this post, I implement and verify a toy compiler for a language of arithmetic expressions and variables to an idealised assembly language for a virtual stack machine, and explain some of the useful features that dependent types give us for writing verified compilers.

*The Agda snippets in this post are interactive! Click on a symbol to see its definition.*

## Wellformedness

One of the immediate advantages that dependent types give us is that we can encode the notion of *term wellformedness* in the type given to terms, rather than as a separate proposition that must be assumed by every theorem.

Even in our language of arithmetic expressions and variables, which does not have much of a static semantics, we can still ensure that each variable used in the program is bound somewhere. We will use indices instead of variable names in the style of de Bruijn (1972), and index terms by the *number of available variables*, a trick I first noticed in McBride (2003). The `Fin`

type, used to represent variables, only contains natural numbers up to its index, which makes it impossible to use variables that are not available.

data Term (n : ℕ) : Set where Lit : ℕ → Term n _⊠_ : Term n → Term n → Term n _⊞_ : Term n → Term n → Term n Let_In_ : Term n → Term (suc n) → Term n Var : Fin n → Term n

This allows us to express in the *type* of our big-step semantics relation that the environment `E`

(here we used the length-indexed `Vec`

type from the Agda standard library) should have a value for every available variable in the term. In any Isabelle specification of the same, we would have to add such length constraints as explicit assumptions, either in the semantics themselves or in theorems about them. In Agda, the dynamic semantics are extremely clean, unencumbered by irritating details of the encoding:

infixl 5 _⊢_⇓_ data _⊢_⇓_ {n : ℕ} ( E : Vec ℕ n) : Term n → ℕ → Set where lit-e : ∀{n} ------------- → E ⊢ Lit n ⇓ n times-e : ∀{e₁ e₂}{v₁ v₂} → E ⊢ e₁ ⇓ v₁ → E ⊢ e₂ ⇓ v₂ --------------------- → E ⊢ e₁ ⊠ e₂ ⇓ v₁ * v₂ plus-e : ∀{e₁ e₂}{v₁ v₂} → E ⊢ e₁ ⇓ v₁ → E ⊢ e₂ ⇓ v₂ --------------------- → E ⊢ e₁ ⊞ e₂ ⇓ v₁ + v₂ var-e : ∀{n}{x} → E [ x ]= n ------------- → E ⊢ Var x ⇓ n let-e : ∀{e₁}{e₂}{v₁ v₂} → E ⊢ e₁ ⇓ v₁ → (v₁ ∷ E) ⊢ e₂ ⇓ v₂ --------------------- → E ⊢ Let e₁ In e₂ ⇓ v₂

By using appropriate type indices, it is possible to extend this technique to work even for languages with elaborate static semantics. For example, linear type systems (see Walker (2005)) can be encoded by indexing terms by type contexts (in a style similar to Oleg). Therefore, the boundary between being *wellformed* and being *well-typed* is entirely arbitrary. It’s possible to use relatively simple terms and encode static semantics as a separate judgement, or to put the entire static semantics inside the term structure, or to use a mixture of both. In this simple example, our static semantics only ensure variables are in scope, so it makes sense to encode the entire static semantics in the terms themselves.

Similar tricks can be employed when encoding our target language, the stack machine
. This machine consists of two stacks of numbers, the *working* stack
and the *storage* stack
, and a program to evaluate. A program is a list of *instructions*.

There are six instructions in total, each of which manipulate these two stacks in various ways. When encoding these instructions in Agda, we index the `Inst`

type by the size of both stacks before and after execution of the instruction:

data Inst : ℕ → ℕ → ℕ → ℕ → Set where num : ∀{w s} → ℕ → Inst w s (suc w) s plus : ∀{w s} → Inst (suc (suc w)) s (suc w) s times : ∀{w s} → Inst (suc (suc w)) s (suc w) s push : ∀{w s} → Inst (suc w) s w (suc s) pick : ∀{w s} → Fin s → Inst w s (suc w) s pop : ∀{w s} → Inst w (suc s) w s

Then, we can define a simple type for programs, essentially a list of instructions where the stack sizes of consecutive instructions must match. This makes it impossible to construct a program with an underflow error:

data SM (w s : ℕ) : ℕ → ℕ → Set where halt : SM w s w s _∷_ : ∀{w′ s′ w″ s″} → Inst w s w′ s′ → SM w′ s′ w″ s″ → SM w s w″ s″

We also define a simple sequential composition operator, equivalent to list append (`++`

):

infixr 5 _⊕_ _⊕_ : ∀{w s w′ s′ w″ s″} → SM w s w′ s′ → SM w′ s′ w″ s″ → SM w s w″ s″ halt ⊕ q = q (x ∷ p) ⊕ q = x ∷ (p ⊕ q)

The semantics of each instruction are given by the following relation, which takes the two stacks and an instruction as input, returning the two updated stacks as output. Note the size of each stack is proscribed by the type of the instruction, just as the size of the environment was proscribed by the type of the term in the source language, which eliminates the need to add tedious wellformedness assumptions to theorems or rules.

infixl 5 _∣_∣_↦_∣_ data _∣_∣_↦_∣_ : ∀{w s w′ s′} → Vec ℕ w → Vec ℕ s → Inst w s w′ s′ → Vec ℕ w′ → Vec ℕ s′ → Set where

The semantics of each instruction are as follows:

- (where ), pushes to .

num-e : ∀{w s}{n}{W : Vec _ w}{S : Vec _ s} ------------------------- → W ∣ S ∣ num n ↦ n ∷ W ∣ S

- , pops two numbers from and pushes their sum back to .

plus-e : ∀{w s}{n m}{W : Vec _ w}{S : Vec _ s} ---------------------------------------- → (n ∷ m ∷ W) ∣ S ∣ plus ↦ (m + n ∷ W) ∣ S

- , pops two numbers from and pushes their product back to .

times-e : ∀{w s}{n m}{W : Vec _ w}{S : Vec _ s} ----------------------------------------- → (n ∷ m ∷ W) ∣ S ∣ times ↦ (m * n ∷ W) ∣ S

- , pops a number from and pushes it to .

push-e : ∀{w s}{n}{W : Vec _ w}{S : Vec _ s} -------------------------------- → (n ∷ W) ∣ S ∣ push ↦ W ∣ (n ∷ S)

- (where ), pushes the number at position from the top of onto .

pick-e : ∀{w s}{x}{n}{W : Vec _ w}{S : Vec _ s} → S [ x ]= n ---------------------------- → W ∣ S ∣ pick x ↦ (n ∷ W) ∣ S

- , removes the top number from .

pop-e : ∀{w s}{n}{W : Vec _ w}{S : Vec _ s} ------------------------- → W ∣ (n ∷ S) ∣ pop ↦ W ∣ S

As programs are lists of instructions, the evaluation of programs is naturally specified as a list of evaluations of instructions:

infixl 5 _∣_∣_⇓_∣_ data _∣_∣_⇓_∣_ {w s : ℕ}(W : Vec ℕ w)(S : Vec ℕ s) : ∀{w′ s′} → SM w s w′ s′ → Vec ℕ w′ → Vec ℕ s′ → Set where halt-e : W ∣ S ∣ halt ⇓ W ∣ S _∷_ : ∀{w′ s′ w″ s″}{i}{is} → {W′ : Vec ℕ w′}{S′ : Vec ℕ s′} → {W″ : Vec ℕ w″}{S″ : Vec ℕ s″} → W ∣ S ∣ i ↦ W′ ∣ S′ → W′ ∣ S′ ∣ is ⇓ W″ ∣ S″ -------------------------- → W ∣ S ∣ (i ∷ is) ⇓ W″ ∣ S″

The semantics of sequential composition is predictably given by appending these lists:

infixl 4 _⟦⊕⟧_ _⟦⊕⟧_ : ∀{w w′ w″ s s′ s″}{P}{Q} → {W : Vec ℕ w}{S : Vec ℕ s} → {W′ : Vec ℕ w′}{S′ : Vec ℕ s′} → {W″ : Vec ℕ w″}{S″ : Vec ℕ s″} → W ∣ S ∣ P ⇓ W′ ∣ S′ → W′ ∣ S′ ∣ Q ⇓ W″ ∣ S″ ------------------------- → W ∣ S ∣ P ⊕ Q ⇓ W″ ∣ S″ halt-e ⟦⊕⟧ ys = ys x ∷ xs ⟦⊕⟧ ys = x ∷ (xs ⟦⊕⟧ ys)

## Writing by proving

Having formally defined our source and target languages, we can now prove our compiler correct – even though we haven’t written a compiler yet!

One of the other significant advantages dependent types bring to compiler verification is the elimination of repetition. In my larger Isabelle formalisation, the proof of the compiler’s correctness largely duplicates the structure of the compiler itself, and this tight coupling means that proofs must be rewritten along with the program – a highly tedious exercise. As dependently typed languages unify the language of code and proof, we can merely provide the correctness proof: in almost all cases, the correctness proof is so specific, that the program of which it demonstrates correctness can be *derived automatically*.

open import Data.Product open import Function open import Data.String open import Data.String.Unsafe

We define a compiler’s correctness to be the commutativity of the following diagram, as per Hutton & Wright (2004).

As we have not proven determinism for our semantics^{1}, such a correctness condition must be shown by the conjunction of a *soundness* and *completeness* condition, similar to Bahr (2015).

**Soundness** is a proof that the compiler output is a *refinement* of the input, that is, every evaluation in the output is matched by the input. The output does not do anything that the input doesn’t do.

-- Sound t u means that u is a sound translation of t Sound : ∀{w s} → Term s → SM w s (suc w) s → Set Sound {w} t u = ∀{v}{E}{W : Vec ℕ w} → W ∣ E ∣ u ⇓ (v ∷ W) ∣ E ----------------------- → E ⊢ t ⇓ v

Note that we generalise the evaluation statements used here slightly to use arbitrary environments and stacks. This is to allow our induction to proceed smoothly.

**Completeness** is a proof that the compiler output is an *abstraction* of the input, that is, every evaluation in the input is matched by the output. The output does everything that the input does.

Complete : ∀{w s} → Term s → SM w s (suc w) s → Set Complete {w} t u = ∀{v}{E}{W : Vec ℕ w} → E ⊢ t ⇓ v ----------------------- → W ∣ E ∣ u ⇓ (v ∷ W) ∣ E

It is this *completeness* condition that will allow us to automatically derive our code generator. Given a term
, our generator will return a Σ-type, or *dependent pair*, containing a
program called
and a proof that
is a complete translation of
:

codegen′ : ∀{w s} → (t : Term s) → Σ[ u ∈ SM w s (suc w) s ] Complete t u

For literals, we simply push the number of the literal onto the working stack:

codegen′ (Lit x ) = _ , proof where proof : Complete _ _ proof lit-e = num-e ∷ halt-e

The code above never explicitly states what program to produce! Instead, it merely provides the completeness proof, and the rest can be inferred by unification. Similar elision can be used for variables, which pick the correct index from the storage stack:

codegen′ (Var x) = _ , proof where proof : Complete _ _ proof (var-e x) = pick-e x ∷ halt-e

The two binary operations are essentially the standard translation for an infix-to-postfix tree traversal, but once again the program is not explicitly emitted, but is inferred from the completeness proof used.

codegen′ (t₁ ⊞ t₂) = _ , proof (proj₂ (codegen′ t₁)) (proj₂ (codegen′ t₂)) where proof : ∀ {u₁}{u₂} → Complete t₁ u₁ → Complete t₂ u₂ → Complete _ _ proof p₁ p₂ (plus-e t₁ t₂) = p₁ t₁ ⟦⊕⟧ p₂ t₂ ⟦⊕⟧ plus-e ∷ halt-e codegen′ (t₁ ⊠ t₂) = _ , proof (proj₂ (codegen′ t₁)) (proj₂ (codegen′ t₂)) where proof : ∀ {u₁}{u₂} → Complete t₁ u₁ → Complete t₂ u₂ → Complete _ _ proof p₁ p₂ (times-e t₁ t₂) = p₁ t₁ ⟦⊕⟧ p₂ t₂ ⟦⊕⟧ times-e ∷ halt-e

The variable-binding form pushes the variable to the storage stack and cleans up after evaluation exits the scope with .

codegen′ (Let t₁ In t₂) = _ , proof (proj₂ (codegen′ t₁)) (proj₂ (codegen′ t₂)) where proof : ∀ {u₁}{u₂} → Complete t₁ u₁ → Complete t₂ u₂ → Complete _ _ proof p₁ p₂ (let-e t₁ t₂) = p₁ t₁ ⟦⊕⟧ push-e ∷ (p₂ t₂ ⟦⊕⟧ pop-e ∷ halt-e)

We can extract a more standard-looking code generator function simply by throwing away the proof that our code generator produces.

codegen : ∀{w s} → Term s → SM w s (suc w) s codegen {w}{s} t = proj₁ (codegen′ {w}{s} t)

## The soundness proof for this code generator isn’t particularly illuminating and is rather awkwardly expressed. Nevertheless, for the truly brave, you may click here to view it.

We use an alternative presentation of the soundness property, that makes explicit several equalities that are implicit in the original formulation of soundness. We prove that our new formulation still implies the original one.

open import Relation.Binary.PropositionalEquality Sound′ : ∀{w s} → Term s → SM w s (suc w) s → Set Sound′ {w} t u = ∀{E E′}{W : Vec ℕ w}{W′} → W ∣ E ∣ u ⇓ W′ ∣ E′ ------------------------------------------ → (E ≡ E′) × (tail W′ ≡ W) × E ⊢ t ⇓ head W′ sound′→sound : ∀{w s}{t}{u} → Sound′ {w}{s} t u → Sound t u sound′→sound p x with p x ... | refl , refl , q = q

As our soundness proof requires us to do a lot of rule inversion on the evaluation of
programs, we need an eliminator for the introduction rule `_⟦⊕⟧_`

, used in the completeness proof, which breaks an evaluation of a sequential composition into evaluations of its component parts:

⊕-elim : ∀{w s w′ s′ w″ s″} {W : Vec ℕ w}{S : Vec ℕ s} {W″ : Vec ℕ w″}{S″ : Vec ℕ s″} {a : SM w s w′ s′}{b : SM w′ s′ w″ s″} → W ∣ S ∣ a ⊕ b ⇓ W″ ∣ S″ → ∃[ W′ ] ∃[ S′ ] ((W ∣ S ∣ a ⇓ W′ ∣ S′) × (W′ ∣ S′ ∣ b ⇓ W″ ∣ S″)) ⊕-elim {a = halt} p = _ , _ , halt-e , p ⊕-elim {a = a ∷ as} (x ∷ p) with ⊕-elim {a = as} p ... | _ , _ , p₁ , p₂ = _ , _ , x ∷ p₁ , p₂Then the soundness proof is given as a boatload of rule inversion and matching on equalities, to convince Agda that there is no other way to possibly evaluate the compiler output:

soundness : ∀{w s}{t : Term s} → Sound′ {w} t (codegen t) soundness {t = Lit x} (num-e ∷ halt-e) = refl , refl , lit-e soundness {t = Var x} (pick-e x₁ ∷ halt-e) = refl , refl , var-e x₁ soundness {t = t₁ ⊠ t₂} x with ⊕-elim {a = codegen t₁ ⊕ codegen t₂} x ... | _ , _ , p , _ with ⊕-elim {a = codegen t₁} p ... | _ , _ , p₁ , p₂ with soundness {t = t₁} p₁ | soundness {t = t₂} p₂ soundness {t = t₁ ⊠ t₂} x | _ ∷ _ ∷ _ , ._ , _ , times-e ∷ halt-e | ._ ∷ ._ , ._ , _ , _ | refl , refl , a | refl , refl , b = refl , refl , times-e a b soundness {t = t₁ ⊞ t₂} x with ⊕-elim {a = codegen t₁ ⊕ codegen t₂} x ... | _ , _ , p , _ with ⊕-elim {a = codegen t₁} p ... | _ , _ , p₁ , p₂ with soundness {t = t₁} p₁ | soundness {t = t₂} p₂ soundness {t = t₁ ⊞ t₂} x | _ ∷ _ ∷ _ , ._ , _ , plus-e ∷ halt-e | ._ ∷ ._ , ._ , _ , _ | refl , refl , a | refl , refl , b = refl , refl , plus-e a b soundness {t = Let t₁ In t₂} x with ⊕-elim {a = codegen t₁} x ... | _ ∷ _ , _ , p₁ , push-e ∷ q with ⊕-elim {a = codegen t₂} q ... | _ ∷ _ , _ ∷ _ , p₂ , _ with soundness {t = t₁} p₁ | soundness {t = t₂} p₂ soundness {t = Let t₁ In t₂} x | _ ∷ ._ , ._ , _ , push-e ∷ q | _ ∷ ._ , ._ ∷ ._ , _ , pop-e ∷ halt-e | refl , refl , a | refl , refl , b = refl , refl , let-e a b

## Compiler Frontend

Now that we have a verified code generator, as a final flourish we’ll implement a basic compiler frontend^{2} for our language and run it on some basic examples.

We define a surface syntax as follows. In the tradition of all the greatest languages such as BASIC, FORTRAN and COBOL, capital letters are exclusively used, and English words are favoured over symbols because it makes the language readable to non-programmers. I should also acknowledge the definite influence of PHP, Perl and `sh`

on the choice of the `$`

sigil to precede variable names. The sigil `#`

precedes numeric literals as Agda does not allow us to overload them.

data Surf : Set where LET_BE_IN_ : String → Surf → Surf → Surf _PLUS_ : Surf → Surf → Surf _TIMES_ : Surf → Surf → Surf $_ : String → Surf #_ : ℕ → Surf infixr 4 LET_BE_IN_ infixl 5 _PLUS_ infixl 6 _TIMES_ infix 7 $_ infix 7 #_

Unlike our `Term`

AST, this surface syntax does not include any scope information, uses strings for variable names, and is more likely to be something that would be produced from a parser. In order to compile this language, we must first translate it into our wellformed-by-construction `Term`

type, which necessitates *scope-checking*.

open import Data.Maybe open import Data.Maybe.Categorical open import Category.Monad open import Category.Applicative import Level open RawMonad (monad {Level.zero}) open import Relation.Nullary

check : ∀{n} → Vec String n → Surf → Maybe (Term n) check Γ (LET x BE s IN t) = pure Let_In_ ⊛ check Γ s ⊛ check (x ∷ Γ) t check Γ (s PLUS t) = pure _⊞_ ⊛ check Γ s ⊛ check Γ t check Γ (s TIMES t) = pure _⊠_ ⊛ check Γ s ⊛ check Γ t check Γ (# x) = pure (Lit x) check Γ ($ x) = pure Var ⊛ find Γ x where find : ∀{n} → Vec String n → String → Maybe (Fin n) find [] s = nothing find (x ∷ v) s with s ≟ x ... | yes _ = just zero ... | no _ = suc <$> find v s

Note that this function is the only one in our development that is partial: it can fail if an undeclared variable is used. For this reason, we use the `Applicative`

instance for `Maybe`

to make the error handling more convenient.

compiler : Surf → Maybe (SM 0 0 1 0) compiler s = codegen <$> check [] s

Note that we can’t really demonstrate correctness of the scope-checking function, save that if it outputs a `Term`

then there are no scope errors in
, as it is impossible to construct a `Term`

with scope errors. One possibility would be to define a semantics for the surface syntax, however this would necessitate a formalisation of substitution and other such unpleasant things. So, we shall gain assurance for this phase of the compiler by embedding some test cases and checking them automatically at compile time.

If we take a simple example, say:

example = LET "x" BE # 4 IN LET "y" BE # 5 IN LET "z" BE # 6 IN $ "x" TIMES $ "y" PLUS $ "z"

We expect that this program should correspond to the following
program:^{3}

result : SM 0 0 1 0 result = num 4 ∷ push ∷ num 5 ∷ push ∷ num 6 ∷ push ∷ pick (i 2) ∷ pick (i 1) ∷ times ∷ pick (i 0) ∷ plus ∷ pop ∷ pop ∷ pop ∷ halt

We can embed this test case as a type by constructing an equality value – that way, the test will be re-run every time it is type-checked:

test-example : compiler example ≡ just result test-example = refl

As this page is only generated when the Agda compiler type checks the code snippets, we know that this test has passed! Hooray!

## Conclusions

Working in Agda to verify compilers is a very different experience from that of implementing a certifying compiler in Haskell and Isabelle. In general, the *implementation* of a compiler phase and the *justification of its correctness* are much, much closer together than in Agda than in my previous approach. This allows us to save a lot of effort by deriving programs from their proofs.

Also, dependent types are sophisticated enough to allow arbitrary invariants to be encoded in the structure of terms, which makes it possible, with clever formalisations, to avoid having to discharge trivial proof obligations repeatedly. This is in stark contrast to traditional theorem provers like Isabelle, where irritating proof obligations are the norm, and heavyweight tactics must be used to discharge them en-masse.

My next experiments will be to try and scale this kind of approach up to more realistic languages. I’ll be sure to post again if I find anything interesting.

#### References

*Mathematics of program construction*, Vol. 9129, Springer International Publishing, pp. 159–186.

*Indagationes Mathematicae (Elsevier)*,

**34**, 381–392.

*Mathematics of program construction*, Springer, pp. 211–227.

*Journal of Functional Programming*,

**13**(6), 1061–1075.

*Isabelle/HOL: A Proof Assistant for Higher-Order Logic*, Vol. 2283, Springer.

*Submitted to POPL. Currently Under Review*.

*Advanced topics in types and programming languages*, MIT Press.