Total Denotational Semantics
September 23, 2024Denotational semantics assign meaning to a program (e.g., in untyped lambda calculus) by mapping the program into a self-contained domain model in some meta language (e.g., Scott domains). Traditionally, what is complicated about denotational semantics is not so much the function that defines them; rather it is to find a sound mathematical definition of the semantic domain, and a general methodology of doing so that scales to recursive types and hence general recursion, global mutable state, exceptions and concurrency12.
In this post, I discuss a related issue: I argue that traditional Scott/Strachey denotational semantics are partial (in a precise sense), which means that
- It is impossible to give a faithful, executable encoding of such a semantics in a programming language, and
- Internal details of the semantic domain inhibit high-level, equational reasonining about programs
After exemplifying the problem, I will discuss total denotational semantics as a viable alternative, and how to define one using guarded recursion.
I do not claim that any of these considerations are novel or indisputable, but I hope that they are helpful to some people who
- know how to read Haskell
- like playing around with operational semantics and definitional interpreters
- wonder how denotational semantics can be executed in a programming language
- want to get excited about guarded recursion.
I hope that this topic becomes more accessible to people with this background due to a focus on computation.
I also hope that this post finds its way to a few semanticists who might provide a useful angle or have answers to the conjectures in the later parts of this post.
If you are in a rush and just want to see how a total denotational semantics can be defined in Agda, have a look at this gist.
Syntax
We start by defining an AST type for lambda calculus:
type Name = String -- x ∈ Name
data Exp -- e
= Var Name -- ::= x
| App Exp Exp -- | e1 e2
| Lam Name Exp -- | λx.e
I will use the usual short-hand notation (BNF in comments above)
(λx.x)~(λy.y)
to parse as
Lam "x" (Var "x") `App` (Lam "y" (Var "y"))
and omit the parser that does this transformation.
I refer to Exp
as the object language and to mathematics/Haskell as our
meta language in which we implement our semantics.
Call-by-name Semantics
Let us now define a denotational semantics for this language. We will first do this in mathematics and later try and turn this definition into Haskell.
A denotational semantics is a function
\mathcal{I}\llbracket\cdot\rrbracket_{\cdot} : \mathsf{Exp} \to \mathsf{Map}(\mathsf{Name}, D) \to D
such that \mathcal{I}\llbracket e\rrbracket_ρ gives an expression e a meaning, or denotation,
in terms of some semantic domain D.
I pronounce \mathcal{I}\llbracket e\rrbracket_ρ as “interpret e in environment ρ”.
The environment ρ gives meaning to the free variables of e,
by mapping each free variable to its denotation in D, and
\mathsf{Map}(\mathsf{Name}, D) is the type of finite maps (think: Data.Map.Map
).
I will write
[x_1\mapsto d_1,...,x_n \mapsto d_n] for a (possibly empty) finite map literal
that maps name x_i to d_i,
ρ[x\mapsto d] for inserting the new mapping x \mapsto d into the finite map
ρ,
\mathsf{dom}(ρ) for the keys that are present in ρ, and
ρ(x) for looking up name x in ρ (provided it is present).
With that disclaimer out of the way, here is the denotational semantics:
\begin{aligned} \mathcal{I}\llbracket x\rrbracket_ρ & = \begin{cases} ρ(x) & x \in \mathsf{dom}(ρ) \\ \bot & \text{otherwise} \end{cases} \\ \mathcal{I}\llbracket λx.e\rrbracket_ρ & = \mathsf{Fun}(f) \text{ where }f(d) = \mathcal{I}\llbracket e\rrbracket_{ρ[x \mapsto d]} \\ \mathcal{I}\llbracket e_1~e_2\rrbracket_ρ & = \mathit{app}(\mathcal{I}\llbracket e_1\rrbracket_ρ, \mathcal{I}\llbracket e_2\rrbracket_ρ) \\ \mathit{app}(d,a) &= \begin{cases} f(a) & \mathsf{Fun}(f) = d \\ \bot & \text{otherwise} \\ \end{cases} \end{aligned}This is the standard by-name definition, but I have omitted an important detail. I have not defined the semantic domain D yet, so I shall do that now.
Scott Domain
To a first approximation, we can think of the Scott domain D as the Haskell data type
data D -- d
= Fun (D -> D) -- ::= Fun (f ∈ D → D)
| Bot -- | ⊥
Alas, this definition is not well-founded, and thus does not denote a proper
set (there is no set D such that D contains functions D \to D).
To make it well-founded, we need to impose an approximation order ⊑ on
D in which \bot ⊏ \mathsf{Fun}(f) and restrict the function type \to to
functions that are monotone (\simeq more defined input leads to more defined
output) and even continuous (\simeq the output on infinite input is determined
by the finite prefixes of the input) in D.
It is not important what exactly continuity means at this point, only that it
is impossible to impose this restriction in the type system of Haskell, let alone
enforce it as an invariant of D
.
Henceforth, for every function f that we stick into the \mathsf{Fun} constructor, we need to prove that it is continuous, including the function f(d) = \mathcal{I}\llbracket e\rrbracket_{ρ[x \mapsto d]} in the lambda case of the semantics. That in turn means we need to prove that the interpretation function is continuous as well, otherwise \mathcal{I}\llbracket\cdot\rrbracket is not a well-defined mathematical object. One can see that this is quite tedious to prove by hand and in practice this obligation is often hand-waved away, endangering the well-definedness of any concept that builds on \mathcal{I}\llbracket\cdot\rrbracket.
Let me support our understanding of \mathcal{I}\llbracket\cdot\rrbracket with a few examples:
- \mathcal{I}\llbracket λx.x\rrbracket_{[]} = \mathsf{Fun}(d \mapsto d). Note that I write d \mapsto d for a lambda expression in math, distinct from the syntax (λd.d).
- \mathcal{I}\llbracket(λx.x)~(λy.y)\rrbracket_{[]} = \mathit{app}(\mathsf{Fun}(d \mapsto d), \mathcal{I}\llbracket λy.y\rrbracket_{[]}) = \mathcal{I}\llbracket λy.y\rrbracket_{[]} = \mathsf{Fun}(d \mapsto d).
- \mathcal{I}\llbracket x~(λy.y)\rrbracket_{[]} = \mathit{app}(\mathcal{I}\llbracket x\rrbracket_{[]}, \mathcal{I}\llbracket λy.y\rrbracket_{[]}) = \mathit{app}(\bot, \mathcal{I}\llbracket λy.y\rrbracket_{[]}) = \bot. The bottom element \bot is used to indicate a stuck program, in this case because variable x is not “in scope” in the empty environment [].
A partially-defined denotational interpreter
Now consider the following attempt to make this semantics executable in Haskell:
interp :: Map Name D -> Exp -> D
Var x) = case Map.lookup x env of
interp env (Just d -> d
Nothing -> Bot
Lam x e) = Fun (\d -> interp (Map.insert x d env) e)
interp env (App e1 e2) = case interp env e1 of
interp env (Fun f -> f (interp env e2)
-> Bot _
I call interp
a denotational interpreter, as suggested by
Matthew Might.
The single most characteristic feature distinguishing it from a big-step style
definitional interpreter is the lambda case.
Note that the denotational interpreter recurses into the lambda body, sticking a
function of type D -> D
into the semantic Fun
constructor, whereas a
big-step style interpreter would simply return the syntax Lam x e
.
Surprisingly, the above translation of \mathcal{I}\llbracket\cdot\rrbracket is almost correct! It is correct whenever the object program terminates and provides the same result as the denotational semantics for all three examples above.
However, when the object program does not terminate, the denotational
interpreter above does neither.
One example of such a program is \Omega := (λx.x~x)~(λx.x~x).
Running interp
on the parse of this program simply loops forever, whereas the
denotational semantics \mathcal{I}\llbracket\cdot\rrbracket assigns the meaning \bot to
\Omega, as common wisdom would have it.
Thus, when viewed as a mathematical function, interp
above is only
partially-defined, or just partial.
This is in contrast to \mathcal{I}\llbracket\cdot\rrbracket, which is a total mathematical
function, defined on every input.
(Doesn’t “\mathcal{I}\llbracket\cdot\rrbracket is total” contradict the title of this post? How can a function such as \mathcal{I}\llbracket\cdot\rrbracket return \bot but still be considered total? Park these thoughts and read on.)
It appears we cannot find a faithful executable definition of \mathcal{I}\llbracket\cdot\rrbracket in Haskell, one which encodes a total mathematical function \mathcal{I}\llbracket\cdot\rrbracket. Which is a pity, because a definition in Haskell greatly helps in prototyping, exploring and even formalising such semantics.
Divergence is hard to grasp in Scott domains
Actually, who says that \mathcal{I}\llbracket\Omega\rrbracket_{[]} = \bot? If we just calculate with the definition of \Omega, we get \begin{aligned} \mathcal{I}\llbracket\Omega\rrbracket_{[]} &= \mathcal{I}\llbracket(λx.x~x)~(λx.x~x)\rrbracket_{[]} \\ &= \mathit{app}(\mathcal{I}\llbracket λx.x~x\rrbracket_{[]}, \mathcal{I}\llbracket λx. x~x\rrbracket_{[]}) \\ &= \mathit{app}(\mathsf{Fun}(d \mapsto \mathcal{I}\llbracket x~x\rrbracket_{[x\mapsto d]}), \mathcal{I}\llbracket λx. x~x\rrbracket_{[]}) \\ &= \mathcal{I}\llbracket x~x\rrbracket_{[x\mapsto \mathcal{I}\llbracket λx. x~x\rrbracket_{[]}]} \\ &= \mathcal{I}\llbracket(λx.x~x)~(λx.x~x)\rrbracket_{[]} \\ &= \mathcal{I}\llbracket\Omega\rrbracket_{[]} \\ \end{aligned}A circular rewrite! This chain of reasoning would hold true regardless of what value we would assign to \Omega. One way to prove that \mathcal{I}\llbracket\Omega\rrbracket_{[]} = \bot is by generalising this statement to \mathrm{Y}(\mathit{id}) and then understanding Example 3.6 of Pitts’ Relational Properties of Domain. Needless to say, even such an “obvious” semantic fact such as “\Omega diverges” does not hold by simple calculation and does not even appear to be true in some (inadequate) semantic domain models. No wonder that it isn’t possible to compute it in Haskell either; after all, computation is just a directed form of equational reasoning.
Total denotational interpreter
Haskell is a lazy language.
As a Haskeller, I eat infinite lists for breakfast and enjoy it, because I know when
to stop take
ing (🥁📀).
Can we somehow use laziness to encode diverging programs?
It turns out that, yes, we can!
We only need to adjust our semantic domain D
as follows:
data T a = Ret !a | Step (T a)
data Value = Fun (D -> D) | Stuck
type D = T Value
The coinductive T
data type (for “trace”) is a
classic. It forms a monad as follows:
instance Monad T where
return = Ret
Ret a >>= k = k a
Step d >>= k = Step (d >>= k)
Note that Step
is lazy, so we get diverging program “traces” as follows:
diverge :: D
= Step diverge diverge
It is pretty simple to run a D
for a finite amount of time:
takeD :: Int -> T a -> Maybe a
Ret a) = Just a
takeD _ (0 _ = Nothing
takeD Step d) = takeD (n-1) d takeD n (
The use of Step
is crucial to stratify diverging computations into an
infinite layering of finite computations, separated by Step
constructors.
In other words: We use coinduction (well, Löb induction, but close enough) to
encode diverging programs.
It is sufficient to delay in a single place: right before we put a denotation into the environment. (A more common alternative in the literature is to delay in the variable case instead.) The new definition becomes
denot :: Map Name D -> Exp -> D
Var x) = case Map.lookup x env of
denot env (Just d -> d
Nothing -> Stuck
Lam x e) = Fun (\d -> denot (Map.insert x (Step d) env) e)
denot env (App e1 e2) = denot env e1 >>= \case
denot env (Fun f -> f (denot env e2)
-> return Stuck _
(A self-contained Haskell playground for denot
can be found here.)
We use monadic bind in the application case to sequence computations.
Note that if evaluation of e1
diverges, the continuation of >>=
will never be called.
Using takeD
, we may now execute a program for any number of Step
s!
denot
thus becomes a total function, defined by mixed induction (for denot
)
and Löb induction (to define the lambda constructor).
Here are a two more examples:
-- |
-- >>> takeD 10000 $ denot Map.empty idid
-- Just (Fun ...)
-- >>> denot Map.empty idid
-- Step (Ret (Fun ...))
idid :: Exp
= Lam "x" (Var "x") `App` Lam "y" (Var "y")
idid
-- |
-- >>> takeD 10000 $ denot Map.empty omega
-- Nothing
omega :: Exp
= Lam "x" (Var "x" `App` Var "x") `App` Lam "y" (Var "y" `App` Var "y") omega
If you want to see more of these, I encourage you to read Section 4 of our work on abstracting denotational interpreters.
Why explicit Stuck
?
I renamed Bot
to Stuck
in the above definition of Value
because that is
what its use now encodes: stuck programs terminate with a Stuck
value that
works like an exception or Milner’s \textbf{wrong}
value (as in “well-typed programs cannot go wrong”).
This is observably distinct from the diverge
denotation for diverging
programs.
I could have used diverge
instead of return Stuck
, however that would be
a bit misleading.
(As misleading as in the vanilla denotational semantics, where stuckness and
divergence are confused.)
Total denotational semantics
We can interpret the above definition of D
straight as a domain equation of
(flat) Scott domains.
To do so, we need to adjoin a distinct bottom element \bot again, with
the property that \bot ⊏ \mathsf{Fun}(f), \bot ⊏ \mathsf{Stuck} and
\mathsf{Fun}(f) incomparable to \mathsf{Stuck}.
Here is the redefinition of \mathcal{I}\llbracket\cdot\rrbracket in terms of the rejigged D (calling it \mathcal{D}\llbracket\cdot\rrbracket for lack of a differentiating name):
\begin{aligned} \mathcal{D}\llbracket x\rrbracket_ρ & = \begin{cases} ρ(x) & x \in \mathsf{dom}(ρ) \\ \mathsf{Ret}(\mathsf{Stuck}) & \text{otherwise} \end{cases} \\ \mathcal{D}\llbracket λx.e\rrbracket_ρ & = \mathsf{Fun}(f) \text{ where }f(d) = \mathcal{D}\llbracket e\rrbracket_{ρ[x \mapsto \mathsf{Step}(d)]} \\ \mathcal{D}\llbracket e_1~e_2\rrbracket_ρ & = \mathcal{D}\llbracket e_1\rrbracket_ρ >\!\!>\!\!=v \mapsto \mathit{app}(v, \mathcal{D}\llbracket e_2\rrbracket_ρ) \\ \mathit{app}(v,a) &= \begin{cases} f(a) & \mathsf{Fun}(f) = v \\ \mathsf{Ret}(\mathsf{Stuck}) & \text{otherwise} \\ \end{cases} \end{aligned}I omitted the definition of the bind operator >\!\!>\!\!=, which is exactly as in Haskell.
It is reasonably simple to see by equational reasoning that \Omega diverges:
\begin{aligned} \mathcal{D}\llbracket\Omega\rrbracket_{[]} &= \mathcal{D}\llbracket(λx.x~x)~(λx.x~x)\rrbracket_{[]} \\ &= \mathcal{D}\llbracket λx.x~x\rrbracket_{[]} >\!\!>\!\!=v \mapsto \mathit{app}(v, \mathcal{D}\llbracket λx. x~x\rrbracket_{[]}) \\ &= \mathsf{Ret}(\mathsf{Fun}(d \mapsto \mathcal{D}\llbracket x~x\rrbracket_{[x\mapsto \mathsf{Step}(d)]})) >\!\!>\!\!=v \mapsto \mathit{app}(v, \mathcal{D}\llbracket λx. x~x\rrbracket_{[]}) \\ &= \mathcal{D}\llbracket x~x\rrbracket_{[x\mapsto \mathsf{Step}(\mathcal{D}\llbracket λx. x~x\rrbracket_{[]})]} \\ &= \mathcal{D}\llbracket x\rrbracket_{[x\mapsto \mathsf{Step}(\mathcal{D}\llbracket λx. x~x\rrbracket_{[]})]} >\!\!>\!\!=v \mapsto \mathit{app}(v, \mathcal{D}\llbracket x\rrbracket_{[x\mapsto \mathsf{Step}(\mathcal{D}\llbracket λx. x~x\rrbracket_{[]})]}) \\ &= \mathsf{Step}(\mathcal{D}\llbracket λx. x~x\rrbracket_{[]}) >\!\!>\!\!=v \mapsto \mathit{app}(v, \mathsf{Step}(\mathcal{D}\llbracket λx. x~x\rrbracket_{[]})) \\ &= \mathsf{Step}(\mathsf{Ret}(\mathsf{Fun}(d \mapsto \mathcal{D}\llbracket x~x\rrbracket_{[x\mapsto \mathsf{Step}(d)]}))) >\!\!>\!\!=v \mapsto \mathit{app}(v, \mathsf{Step}(\mathcal{D}\llbracket λx. x~x\rrbracket_{[]})) \\ &= \mathsf{Step}(\mathcal{D}\llbracket x~x\rrbracket_{[x\mapsto \mathsf{Step}(\mathsf{Step}(\mathcal{D}\llbracket λx. x~x\rrbracket_{[]}))]}) \\ &= ... \\ &= \mathsf{Step}^n(\mathcal{D}\llbracket x~x\rrbracket_{[x\mapsto \mathsf{Step}^m(\mathcal{D}\llbracket λx. x~x\rrbracket_{[]})]}) \\ &= ... \\ &= \mathsf{Step}^ω = \mathit{diverge} \\ \end{aligned}So that’s already nicer than with \mathcal{I}\llbracket\cdot\rrbracket. Another key difference to \mathcal{I}\llbracket\cdot\rrbracket is that I believe \mathcal{D}\llbracket\cdot\rrbracket to be total, which implies that it maps total environments to total denotations3. (I’m very optimistically omitting a proof here, the novelty of which would be unlikely. See the footnote.)
An element of a domain d is total4 (or, maximal) if there are no other elements d' above it, so d ⊑ d' \implies d = d'. An element d is partial if there exists an element d' above it, d ⊏ d'.
Clearly, the bottom element \bot is as partial as it gets (in a non-trivial
domain such as D).
There are other finite partial elements such as \mathsf{Step}(\bot), but also
infinite partial elements such as (\bot,(\bot,(\bot,...))) (if our D had a
pair constructor).
There are finite total elements such as \mathsf{Ret}(\mathsf{Stuck}) and infinite
total elements such as \mathsf{Step}^ω (corresponding to diverge
).
In this precise sense, the vanilla denotational semantics \mathcal{I}\llbracket e\rrbracket is a partial element of the monotone function space \mathsf{Map}(\mathsf{Name}, D) \to D (it returns \bot in a couple of situations) and \mathcal{D}\llbracket e\rrbracket is a total element of the monotone function space \mathsf{Map}(\mathsf{Name}, D) \to D. Note how this notion of partiality is different from partially of a mathematical function: \mathcal{I}\llbracket e\rrbracket is a partial element, but a total function.
Reasoning and computing with total elements
The crucial insight is this: In programming languages, we can only really reason equationally about and compute with total elements (finite or infinite), but not so easily with partial ones, because that requires reasoning in the actual formal semantics for the programming language (rather than reasoning equationally on its syntax).
It is tiresome that in order to reason about diverging programs in our object
language Exp
, we need to know whether a Haskell program diverges.
This means we need to know two formal semantics: a small one for Exp
and a big
one for Haskell.
Total elements and guarded recursion abstract away the approximation order
There is another advantage to such total denotational semantics: Since total elements have no elements above them, the approximation order on total elements is discrete, just like we are used from set theory. That is, \mathsf{Ret}(\mathsf{Stuck}) is incomparable to \mathsf{Step}(\mathsf{Ret}(\mathsf{Stuck})) because both are total elements. Thus, one can write total denotational semantics without thinking about the approximation order, except in one specific case: encodings of recursion.
Our denotational semantics encodes recursion in the seemingly paradoxical type of the \mathsf{Fun} constructor, the field of which is a function taking a D as argument. Such negative recursive occurrences are the bane of semanticists, but they have developed a remedy: Guarded type systems.
Guarded types erase to continuous definitions
In a guarded type system, one would declare that \mathsf{Fun} has type \blacktriangleright D \to D, where \blacktriangleright is an applicative functor called the later modality, and the type checker would happily accept such a type. The later modality can be thought of as a principled way to introduce a lazy thunk. As such, we may change the type of \mathsf{Step} to \blacktriangleright D \to D, encoding the lazy nature of \mathsf{Step}. Do note that the d introduced in the lambda case of \mathcal{D}\llbracket\cdot\rrbracket has type \blacktriangleright D, so it is important to give \mathsf{Step} this delaying type in order for the interpreter to be well-typed and total.
Et voilà, we have just proven that \mathcal{D}\llbracket\cdot\rrbracket is well-defined and total without (directly) appealing to any domain theory!
Presently, only few proof assistants have support for guarded types, but
Agda is one of them (albeit with rather rudimentary support).
You can find an Agda encoding of \mathcal{D}\llbracket\cdot\rrbracket in
this gist.
A self-contained Haskell playground for denot
can be found here.
Conclusion
This post was kind of a mixed bag of thoughts I had while working on our paper Abstracting Denotational Interpreters, where I found it frustrating that I was lacking both the space and the words to phrase this article into a proper motivation for the use of guarded type theory. The ultimate goal of this post is to lure people who are familiar with traditional denotational semantics into learning about total denotational semantics and guarded type theory.
I sincerely believe that guarded type theory is the future for total, executable
reasoning about what today is often branded coinduction, since it is much more
applicable than mere coinduction but also different in subtle ways5.
(For example, D
could not be defined by coinduction because of the negative
recursion in Fun
.)
Furthermore, it comes with decent (yet still incomplete) axiomatisations in
proof assistants.
I have glossed over a lot of related work in this post in order not to break the flow.
Apologies for that.
I suggest to have a look at the PLS lab page on guarded recursion,
and perhaps start by reading Bob Atkey and Conor McBride’s classic Productive
Coprogramming with Guarded Recursion (which
does not cover negative recursive occurrences yet).
One of the most exciting recent results in this area is a formal model for
impredicate guarded dependent type theory
(phew!), which can serve as the justification for an axiomatisation in Rocq and
Lean, which has since been used to define
the equivalent of the Iris higher-order concurrent separation logic framework.
The trace type T
above is a much simplified version of a
guarded
interaction tree.
Definitional equality on (guarded or Scott) D
is also not a very useful
program equivalence – you would still need to appeal to the coarser contextual
equivalence or a custom logical relation for proving interesting properties
about program rewrites.
My interest is in the context of abstract interpretation, where it is sufficient
to view equivalence modulo the abstraction function α.
The classic text demonstrating these issues is A Syntactic Approach to Type Soundness.↩︎
Incredibly smart semanticists such as Lars Birkedal, Jon Sterling and others are continually improving the “internal language” (which I think means “domain-specific language” to category theorists) for defining such complex semantic domains; the subject of synthetic domain theory. This talk by Jon (with questions from 90 years old Dana Scott!) provides a lot of historic context and can serve as a technical introduction as well.↩︎
On the monotone function space D \to D, a function f is only total when it is also maximally non-strict – that is, f(d) must be the greatest lower bound of its upper set in the range of f. More formally, f(d) = ⊓ \{ d' \mid d'∈ \mathsf{rng}(f) \land f(d) ⊑ d' \}. For any function f defined on total inputs, this can easily be achieved by extending it to partial inputs via f(d) := ⊓ \{ f(d') \mid d' ∈ D \land d \text{ total} \land d ⊑ d' \}. Note that any f' such that f ⊏ f' must overcommit for one input d_1 (so f(d_1)⊏f'(d_1)), but then there must exist another d_2 such that d_1 ⊏ d_2 for which f(d_2) \not⊑ f'(d_2), otherwise f'(d_1) would be a greater lower bound than f(d_1); contradiction.↩︎
This is at least the terminology used in Cartwright et al., which I very much enjoyed.↩︎
For a comparison of induction, coinduction and guarded recursion, have a look at Section 16 of the Iris lecture notes.↩︎
- © Lanyon Jekyll theme
- Adapted for Hakyll by Sebastian Graf
- Powered by Hakyll