# Total Denotational Semantics

September 23, 2024Denotational semantics assign meaning to a program (e.g., in untyped lambda
calculus) by mapping the program into a self-contained domain model in some meta
language (e.g., Scott domains).
Traditionally, what is complicated about denotational semantics is not so much
the *function* that defines them; rather it is to find a sound mathematical
definition of the *semantic domain*, and a general methodology of doing so that
scales to recursive types and hence general recursion, global mutable state,
exceptions and concurrency^{1}^{2}.

In this post, I discuss a related issue: I argue that traditional Scott/Strachey
denotational semantics are *partial* (in a precise sense), which means that

- It is impossible to give a faithful, executable encoding of such a semantics in a programming language, and
- Internal details of the semantic domain inhibit high-level, equational reasonining about programs

After exemplifying the problem, I will discuss *total* denotational semantics as
a viable alternative, and how to define one using guarded recursion.

I do not claim that any of these considerations are novel or indisputable, but I hope that they are helpful to some people who

- know how to read Haskell
- like playing around with operational semantics and definitional interpreters
- wonder how denotational semantics can be executed in a programming language
- want to get excited about guarded recursion.

I hope that this topic becomes more accessible to people with this background
due to a focus on *computation*.

I also hope that this post finds its way to a few semanticists who might provide a useful angle or have answers to the conjectures in the later parts of this post.

If you are in a rush and just want to see how a total denotational semantics can be defined in Agda, have a look at this gist.

# Syntax

We start by defining an AST type for lambda calculus:

```
type Name = String -- x ∈ Name
data Exp -- e
= Var Name -- ::= x
| App Exp Exp -- | e1 e2
| Lam Name Exp -- | λx.e
```

I will use the usual short-hand notation (BNF in comments above)

(λx.x)~(λy.y)

to parse as

`Lam "x" (Var "x") `App` (Lam "y" (Var "y"))`

and omit the parser that does this transformation.
I refer to `Exp`

as the *object language* and to mathematics/Haskell as our
*meta language* in which we implement our semantics.

# Call-by-name Semantics

Let us now define a denotational semantics for this language. We will first do this in mathematics and later try and turn this definition into Haskell.

A denotational semantics is a function
\mathcal{I}\llbracket\cdot\rrbracket_{\cdot} : \mathsf{Exp} \to \mathsf{Map}(\mathsf{Name}, D) \to D
such that \mathcal{I}\llbracket e\rrbracket_ρ gives an expression e a meaning, or *denotation*,
in terms of some semantic domain D.
I pronounce \mathcal{I}\llbracket e\rrbracket_ρ as “interpret e in environment ρ”.
The environment ρ gives meaning to the free variables of e,
by mapping each free variable to its denotation in D, and
\mathsf{Map}(\mathsf{Name}, D) is the type of finite maps (think: `Data.Map.Map`

).
I will write
[x_1\mapsto d_1,...,x_n \mapsto d_n] for a (possibly empty) finite map literal
that maps name x_i to d_i,
ρ[x\mapsto d] for inserting the new mapping x \mapsto d into the finite map
ρ,
\mathsf{dom}(ρ) for the keys that are present in ρ, and
ρ(x) for looking up name x in ρ (provided it is present).

With that disclaimer out of the way, here is the denotational semantics:

\begin{aligned} \mathcal{I}\llbracket x\rrbracket_ρ & = \begin{cases} ρ(x) & x \in \mathsf{dom}(ρ) \\ \bot & \text{otherwise} \end{cases} \\ \mathcal{I}\llbracket λx.e\rrbracket_ρ & = \mathsf{Fun}(f) \text{ where }f(d) = \mathcal{I}\llbracket e\rrbracket_{ρ[x \mapsto d]} \\ \mathcal{I}\llbracket e_1~e_2\rrbracket_ρ & = \mathit{app}(\mathcal{I}\llbracket e_1\rrbracket_ρ, \mathcal{I}\llbracket e_2\rrbracket_ρ) \\ \mathit{app}(d,a) &= \begin{cases} f(a) & \mathsf{Fun}(f) = d \\ \bot & \text{otherwise} \\ \end{cases} \end{aligned}This is the standard by-name definition, but I have omitted an important detail. I have not defined the semantic domain D yet, so I shall do that now.

# Scott Domain

To a first approximation, we can think of the Scott domain D as the Haskell data type

```
data D -- d
= Fun (D -> D) -- ::= Fun (f ∈ D → D)
| Bot -- | ⊥
```

Alas, this definition is not *well-founded*, and thus does not denote a proper
set (there is no set D such that D contains functions D \to D).
To make it well-founded, we need to impose an approximation order ⊑ on
D in which \bot ⊏ \mathsf{Fun}(f) and restrict the function type \to to
functions that are monotone (\simeq more defined input leads to more defined
output) and even continuous (\simeq the output on infinite input is determined
by the finite prefixes of the input) in D.
It is not important what *exactly* continuity means at this point, only that it
is impossible to impose this restriction in the type system of Haskell, let alone
enforce it as an invariant of `D`

.

Henceforth, for every function f that we stick into the \mathsf{Fun} constructor, we need to prove that it is continuous, including the function f(d) = \mathcal{I}\llbracket e\rrbracket_{ρ[x \mapsto d]} in the lambda case of the semantics. That in turn means we need to prove that the interpretation function is continuous as well, otherwise \mathcal{I}\llbracket\cdot\rrbracket is not a well-defined mathematical object. One can see that this is quite tedious to prove by hand and in practice this obligation is often hand-waved away, endangering the well-definedness of any concept that builds on \mathcal{I}\llbracket\cdot\rrbracket.

Let me support our understanding of \mathcal{I}\llbracket\cdot\rrbracket with a few examples:

- \mathcal{I}\llbracket λx.x\rrbracket_{[]} = \mathsf{Fun}(d \mapsto d).
Note that I write d \mapsto d for a lambda expression in math, distinct from the
*syntax*(λd.d). - \mathcal{I}\llbracket(λx.x)~(λy.y)\rrbracket_{[]} = \mathit{app}(\mathsf{Fun}(d \mapsto d), \mathcal{I}\llbracket λy.y\rrbracket_{[]}) = \mathcal{I}\llbracket λy.y\rrbracket_{[]} = \mathsf{Fun}(d \mapsto d).
- \mathcal{I}\llbracket x~(λy.y)\rrbracket_{[]} = \mathit{app}(\mathcal{I}\llbracket x\rrbracket_{[]}, \mathcal{I}\llbracket λy.y\rrbracket_{[]}) = \mathit{app}(\bot, \mathcal{I}\llbracket λy.y\rrbracket_{[]}) = \bot. The bottom element \bot is used to indicate a stuck program, in this case because variable x is not “in scope” in the empty environment [].

# A *partially-defined* denotational interpreter

Now consider the following attempt to make this semantics executable in Haskell:

```
interp :: Map Name D -> Exp -> D
Var x) = case Map.lookup x env of
interp env (Just d -> d
Nothing -> Bot
Lam x e) = Fun (\d -> interp (Map.insert x d env) e)
interp env (App e1 e2) = case interp env e1 of
interp env (Fun f -> f (interp env e2)
-> Bot _
```

I call `interp`

a *denotational interpreter*, as suggested by
Matthew Might.
The single most characteristic feature distinguishing it from a big-step style
definitional interpreter is the lambda case.
Note that the denotational interpreter recurses into the lambda body, sticking a
function of type `D -> D`

into the semantic `Fun`

constructor, whereas a
big-step style interpreter would simply return the syntax `Lam x e`

.

Surprisingly, the above translation of \mathcal{I}\llbracket\cdot\rrbracket is *almost* correct!
It is correct whenever the object program terminates and
provides the same result as the denotational semantics for all three examples above.

However, when the object program does *not* terminate, the denotational
interpreter above does neither.
One example of such a program is \Omega := (λx.x~x)~(λx.x~x).
Running `interp`

on the parse of this program simply loops forever, whereas the
denotational semantics \mathcal{I}\llbracket\cdot\rrbracket assigns the meaning \bot to
\Omega, as common wisdom would have it.
Thus, when viewed as a mathematical function, `interp`

above is only
partially-defined, or just *partial*.
This is in contrast to \mathcal{I}\llbracket\cdot\rrbracket, which is a *total* mathematical
function, defined on every input.

(Doesn’t “\mathcal{I}\llbracket\cdot\rrbracket is total” contradict the title of this post? How can a function such as \mathcal{I}\llbracket\cdot\rrbracket return \bot but still be considered total? Park these thoughts and read on.)

**It appears we cannot find a faithful executable definition of
\mathcal{I}\llbracket\cdot\rrbracket in Haskell**, one which encodes a *total* mathematical
function \mathcal{I}\llbracket\cdot\rrbracket.
Which is a pity, because a definition in Haskell greatly helps in prototyping,
exploring and even formalising such semantics.

# Divergence is hard to grasp in Scott domains

Actually, who says that \mathcal{I}\llbracket\Omega\rrbracket_{[]} = \bot? If we just calculate with the definition of \Omega, we get \begin{aligned} \mathcal{I}\llbracket\Omega\rrbracket_{[]} &= \mathcal{I}\llbracket(λx.x~x)~(λx.x~x)\rrbracket_{[]} \\ &= \mathit{app}(\mathcal{I}\llbracket λx.x~x\rrbracket_{[]}, \mathcal{I}\llbracket λx. x~x\rrbracket_{[]}) \\ &= \mathit{app}(\mathsf{Fun}(d \mapsto \mathcal{I}\llbracket x~x\rrbracket_{[x\mapsto d]}), \mathcal{I}\llbracket λx. x~x\rrbracket_{[]}) \\ &= \mathcal{I}\llbracket x~x\rrbracket_{[x\mapsto \mathcal{I}\llbracket λx. x~x\rrbracket_{[]}]} \\ &= \mathcal{I}\llbracket(λx.x~x)~(λx.x~x)\rrbracket_{[]} \\ &= \mathcal{I}\llbracket\Omega\rrbracket_{[]} \\ \end{aligned}A circular rewrite! This chain of reasoning would hold true regardless of what value we would assign to \Omega. One way to prove that \mathcal{I}\llbracket\Omega\rrbracket_{[]} = \bot is by generalising this statement to \mathrm{Y}(\mathit{id}) and then understanding Example 3.6 of Pitts’ Relational Properties of Domain. Needless to say, even such an “obvious” semantic fact such as “\Omega diverges” does not hold by simple calculation and does not even appear to be true in some (inadequate) semantic domain models. No wonder that it isn’t possible to compute it in Haskell either; after all, computation is just a directed form of equational reasoning.

# Total denotational interpreter

Haskell is a lazy language.
As a Haskeller, I eat infinite lists for breakfast and enjoy it, because I know when
to stop `take`

ing (🥁📀).
Can we somehow use laziness to encode diverging programs?

It turns out that, yes, we can!
We only need to adjust our semantic domain `D`

as follows:

```
data T a = Ret !a | Step (T a)
data Value = Fun (D -> D) | Stuck
type D = T Value
```

The coinductive `T`

data type (for “trace”) is a
classic. It forms a monad as follows:

```
instance Monad T where
return = Ret
Ret a >>= k = k a
Step d >>= k = Step (d >>= k)
```

Note that `Step`

is lazy, so we get diverging program “traces” as follows:

```
diverge :: D
= Step diverge diverge
```

It is pretty simple to run a `D`

for a finite amount of time:

```
takeD :: Int -> T a -> Maybe a
Ret a) = Just a
takeD _ (0 _ = Nothing
takeD Step d) = takeD (n-1) d takeD n (
```

The use of `Step`

is crucial to stratify diverging computations into an
infinite layering of finite computations, separated by `Step`

constructors.
In other words: We use coinduction (well, Löb induction, but close enough) to
encode diverging programs.

It is sufficient to delay in a single place: right before we put a denotation into the environment. (A more common alternative in the literature is to delay in the variable case instead.) The new definition becomes

```
denot :: Map Name D -> Exp -> D
Var x) = case Map.lookup x env of
denot env (Just d -> d
Nothing -> Stuck
Lam x e) = Fun (\d -> denot (Map.insert x (Step d) env) e)
denot env (App e1 e2) = denot env e1 >>= \case
denot env (Fun f -> f (denot env e2)
-> return Stuck _
```

(A self-contained Haskell playground for `denot`

can be found here.)

We use monadic bind in the application case to sequence computations.
Note that if evaluation of `e1`

diverges, the continuation of `>>=`

will never be called.

Using `takeD`

, we may now execute a program for any number of `Step`

s!
`denot`

thus becomes a *total* function, defined by mixed induction (for `denot`

)
and Löb induction (to define the lambda constructor).
Here are a two more examples:

```
-- |
-- >>> takeD 10000 $ denot Map.empty idid
-- Just (Fun ...)
-- >>> denot Map.empty idid
-- Step (Ret (Fun ...))
idid :: Exp
= Lam "x" (Var "x") `App` Lam "y" (Var "y")
idid
-- |
-- >>> takeD 10000 $ denot Map.empty omega
-- Nothing
omega :: Exp
= Lam "x" (Var "x" `App` Var "x") `App` Lam "y" (Var "y" `App` Var "y") omega
```

If you want to see more of these, I encourage you to read Section 4 of our work on abstracting denotational interpreters.

### Why explicit `Stuck`

?

I renamed `Bot`

to `Stuck`

in the above definition of `Value`

because that is
what its use now encodes: stuck programs terminate with a `Stuck`

value that
works like an exception or Milner’s \textbf{wrong}
value (as in “well-typed programs cannot go wrong”).
This is observably distinct from the `diverge`

denotation for diverging
programs.
I *could* have used `diverge`

instead of `return Stuck`

, however that would be
a bit misleading.
(As misleading as in the vanilla denotational semantics, where stuckness and
divergence are confused.)

# Total denotational semantics

We can interpret the above definition of `D`

straight as a domain equation of
(flat) Scott domains.
To do so, we need to adjoin a distinct bottom element \bot again, with
the property that \bot ⊏ \mathsf{Fun}(f), \bot ⊏ \mathsf{Stuck} and
\mathsf{Fun}(f) incomparable to \mathsf{Stuck}.

Here is the redefinition of \mathcal{I}\llbracket\cdot\rrbracket in terms of the rejigged D (calling it \mathcal{D}\llbracket\cdot\rrbracket for lack of a differentiating name):

\begin{aligned} \mathcal{D}\llbracket x\rrbracket_ρ & = \begin{cases} ρ(x) & x \in \mathsf{dom}(ρ) \\ \mathsf{Ret}(\mathsf{Stuck}) & \text{otherwise} \end{cases} \\ \mathcal{D}\llbracket λx.e\rrbracket_ρ & = \mathsf{Fun}(f) \text{ where }f(d) = \mathcal{D}\llbracket e\rrbracket_{ρ[x \mapsto \mathsf{Step}(d)]} \\ \mathcal{D}\llbracket e_1~e_2\rrbracket_ρ & = \mathcal{D}\llbracket e_1\rrbracket_ρ >\!\!>\!\!=v \mapsto \mathit{app}(v, \mathcal{D}\llbracket e_2\rrbracket_ρ) \\ \mathit{app}(v,a) &= \begin{cases} f(a) & \mathsf{Fun}(f) = v \\ \mathsf{Ret}(\mathsf{Stuck}) & \text{otherwise} \\ \end{cases} \end{aligned}I omitted the definition of the bind operator >\!\!>\!\!=, which is exactly as in Haskell.

It is reasonably simple to see by equational reasoning that \Omega diverges:

\begin{aligned} \mathcal{D}\llbracket\Omega\rrbracket_{[]} &= \mathcal{D}\llbracket(λx.x~x)~(λx.x~x)\rrbracket_{[]} \\ &= \mathcal{D}\llbracket λx.x~x\rrbracket_{[]} >\!\!>\!\!=v \mapsto \mathit{app}(v, \mathcal{D}\llbracket λx. x~x\rrbracket_{[]}) \\ &= \mathsf{Ret}(\mathsf{Fun}(d \mapsto \mathcal{D}\llbracket x~x\rrbracket_{[x\mapsto \mathsf{Step}(d)]})) >\!\!>\!\!=v \mapsto \mathit{app}(v, \mathcal{D}\llbracket λx. x~x\rrbracket_{[]}) \\ &= \mathcal{D}\llbracket x~x\rrbracket_{[x\mapsto \mathsf{Step}(\mathcal{D}\llbracket λx. x~x\rrbracket_{[]})]} \\ &= \mathcal{D}\llbracket x\rrbracket_{[x\mapsto \mathsf{Step}(\mathcal{D}\llbracket λx. x~x\rrbracket_{[]})]} >\!\!>\!\!=v \mapsto \mathit{app}(v, \mathcal{D}\llbracket x\rrbracket_{[x\mapsto \mathsf{Step}(\mathcal{D}\llbracket λx. x~x\rrbracket_{[]})]}) \\ &= \mathsf{Step}(\mathcal{D}\llbracket λx. x~x\rrbracket_{[]}) >\!\!>\!\!=v \mapsto \mathit{app}(v, \mathsf{Step}(\mathcal{D}\llbracket λx. x~x\rrbracket_{[]})) \\ &= \mathsf{Step}(\mathsf{Ret}(\mathsf{Fun}(d \mapsto \mathcal{D}\llbracket x~x\rrbracket_{[x\mapsto \mathsf{Step}(d)]}))) >\!\!>\!\!=v \mapsto \mathit{app}(v, \mathsf{Step}(\mathcal{D}\llbracket λx. x~x\rrbracket_{[]})) \\ &= \mathsf{Step}(\mathcal{D}\llbracket x~x\rrbracket_{[x\mapsto \mathsf{Step}(\mathsf{Step}(\mathcal{D}\llbracket λx. x~x\rrbracket_{[]}))]}) \\ &= ... \\ &= \mathsf{Step}^n(\mathcal{D}\llbracket x~x\rrbracket_{[x\mapsto \mathsf{Step}^m(\mathcal{D}\llbracket λx. x~x\rrbracket_{[]})]}) \\ &= ... \\ &= \mathsf{Step}^ω = \mathit{diverge} \\ \end{aligned}So that’s already nicer than with \mathcal{I}\llbracket\cdot\rrbracket.
Another key difference to \mathcal{I}\llbracket\cdot\rrbracket is that I believe \mathcal{D}\llbracket\cdot\rrbracket
to be total, which implies that it maps total environments to total denotations^{3}.
(I’m *very optimistically* omitting a proof here, the novelty of which would
be unlikely. See the footnote.)

An element of a domain d is *total*^{4} (or, *maximal*) if there are no other
elements d' above it, so d ⊑ d' \implies d = d'.
An element d is *partial* if there exists an element d' above it, d ⊏ d'.

Clearly, the bottom element \bot is as partial as it gets (in a non-trivial
domain such as D).
There are other *finite* partial elements such as \mathsf{Step}(\bot), but also
*infinite* partial elements such as (\bot,(\bot,(\bot,...))) (if our D had a
pair constructor).
There are *finite* total elements such as \mathsf{Ret}(\mathsf{Stuck}) and *infinite*
total elements such as \mathsf{Step}^ω (corresponding to `diverge`

).

*In this precise sense*, the vanilla denotational semantics \mathcal{I}\llbracket e\rrbracket is a
*partial* element of the monotone function space
\mathsf{Map}(\mathsf{Name}, D) \to D (it returns \bot in a couple of
situations) and \mathcal{D}\llbracket e\rrbracket is a *total* element of the monotone function space
\mathsf{Map}(\mathsf{Name}, D) \to D.
Note how this notion of partiality is different from partially of
a mathematical function: \mathcal{I}\llbracket e\rrbracket is a partial *element*, but a total *function*.

### Reasoning and computing with total elements

The crucial insight is this: **In programming languages, we can only really
reason equationally about and compute with total elements (finite or
infinite), but not so easily with partial ones**, because that requires
reasoning in the actual formal

*semantics*for the programming language (rather than reasoning equationally on its

*syntax*).

It is tiresome that in order to reason about diverging programs in our object
language `Exp`

, we need to know whether a Haskell program diverges.
This means we need to know two formal semantics: a small one for `Exp`

and a big
one for Haskell.

### Total elements and guarded recursion abstract away the approximation order

There is another advantage to such total denotational semantics:
Since total elements have no elements above them, the approximation order on
total elements is *discrete*, just like we are used from set theory.
That is, \mathsf{Ret}(\mathsf{Stuck}) is incomparable to
\mathsf{Step}(\mathsf{Ret}(\mathsf{Stuck})) because both are total elements.
Thus, one can write total denotational semantics without thinking about
the approximation order, except in one specific case: encodings of recursion.

Our denotational semantics encodes recursion in the seemingly paradoxical type of the \mathsf{Fun} constructor, the field of which is a function taking a D as argument. Such negative recursive occurrences are the bane of semanticists, but they have developed a remedy: Guarded type systems.

### Guarded types erase to continuous definitions

In a guarded type system, one would declare that \mathsf{Fun} has type
\blacktriangleright D \to D, where \blacktriangleright is an applicative functor
called the *later modality*, and the type checker would happily accept such
a type. The later modality can be thought of as a principled way to introduce a
lazy thunk. As such, we may change the type of \mathsf{Step} to \blacktriangleright D \to
D, encoding the lazy nature of \mathsf{Step}.
Do note that the d introduced in the lambda case of \mathcal{D}\llbracket\cdot\rrbracket has type
\blacktriangleright D, so it is important to give \mathsf{Step} this delaying type in
order for the interpreter to be well-typed and total.

Et voilà, we have just proven that \mathcal{D}\llbracket\cdot\rrbracket is well-defined and total without (directly) appealing to any domain theory!

Presently, only few proof assistants have support for guarded types, but
Agda is one of them (albeit with rather rudimentary support).
You can find an Agda encoding of \mathcal{D}\llbracket\cdot\rrbracket in
this gist.
A self-contained Haskell playground for `denot`

can be found here.

# Conclusion

This post was kind of a mixed bag of thoughts I had while working on our paper Abstracting Denotational Interpreters, where I found it frustrating that I was lacking both the space and the words to phrase this article into a proper motivation for the use of guarded type theory. The ultimate goal of this post is to lure people who are familiar with traditional denotational semantics into learning about total denotational semantics and guarded type theory.

I sincerely believe that guarded type theory is the future for total, executable
reasoning about what today is often branded coinduction, since it is much more
applicable than mere coinduction but also different in subtle ways^{5}.
(For example, `D`

could not be defined by coinduction because of the negative
recursion in `Fun`

.)
Furthermore, it comes with decent (yet still incomplete) axiomatisations in
proof assistants.

I have glossed over *a lot* of related work in this post in order not to break the flow.
Apologies for that.
I suggest to have a look at the PLS lab page on guarded recursion,
and perhaps start by reading Bob Atkey and Conor McBride’s classic Productive
Coprogramming with Guarded Recursion (which
does not cover negative recursive occurrences yet).
One of the most exciting recent results in this area is a formal model for
impredicate guarded dependent type theory
(phew!), which can serve as the justification for an axiomatisation in Rocq and
Lean, which has since been used to define
the equivalent of the Iris higher-order concurrent separation logic framework.
The trace type `T`

above is a much simplified version of a
guarded
interaction tree.
Definitional equality on (guarded or Scott) `D`

is also *not* a very useful
program equivalence – you would still need to appeal to the coarser contextual
equivalence or a custom logical relation for proving interesting properties
about program rewrites.
My interest is in the context of abstract interpretation, where it is sufficient
to view equivalence modulo the abstraction function α.

The classic text demonstrating these issues is A Syntactic Approach to Type Soundness.↩︎

Incredibly smart semanticists such as Lars Birkedal, Jon Sterling and others are continually improving the “internal language” (which I think means “domain-specific language” to category theorists) for defining such complex semantic domains; the subject of synthetic domain theory. This talk by Jon (with questions from 90 years old Dana Scott!) provides a lot of historic context and can serve as a technical introduction as well.↩︎

On the monotone function space D \to D, a function f is only total when it is also maximally non-strict – that is, f(d) must be the greatest lower bound of its upper set in the range of f. More formally, f(d) = ⊓ \{ d' \mid d'∈ \mathsf{rng}(f) \land f(d) ⊑ d' \}. For any function f defined on total inputs, this can easily be achieved by extending it to partial inputs via f(d) := ⊓ \{ f(d') \mid d' ∈ D \land d \text{ total} \land d ⊑ d' \}. Note that any f' such that f ⊏ f' must overcommit for one input d_1 (so f(d_1)⊏f'(d_1)), but then there must exist another d_2 such that d_1 ⊏ d_2 for which f(d_2) \not⊑ f'(d_2), otherwise f'(d_1) would be a greater lower bound than f(d_1); contradiction.↩︎

This is at least the terminology used in Cartwright et al., which I very much enjoyed.↩︎

For a comparison of induction, coinduction and guarded recursion, have a look at Section 16 of the Iris lecture notes.↩︎