Writing a lambda calculus interpeter in a new language and an old language

Reading time: 21 minute(s).

2026-05-04

Back to Basics: Lambda Calculus Interpreter

You should definitely lay down your new, shiniest tool that will unlock infinite productivity and learn something just for yourself, for fun, slowly, clumsily. You'll be left with something that can't be ever taken away: experience!

I wanted to learn Zig, so I decided I would write a lambda calculus interpreter with just the standard library. I also wrote a Haskell implementation acting as a baseline; Haskell is an entirely different class of language and it'll be fun to talk about the differences in implementation.

You can follow along in any language and try to write your own lambda calculus interpreter. I'll first highlight the general architecture used the Haskell implementation and then present how the same steps were implemented in Zig.

And a note: I apologize for formatting and code examples. Following from source might be easier.

Links to Code

The code in this article might reflect an older commit than main.

You can skip the next section if you know what lambda calculus is.

What's a Lambda Calculus?

Lambda Calculus was invented in the '30 by logician Alonzo Church as he was attempting to improve on Russel's type theoretical foundation of mathematics. Many notations for function substitution were used before, but Lambda Calculus was the first one where substitution and alpha conversion were formalized.

In essence, lambda calculus defines "lambda terms" which can be "reduced", and there is a single computational rule to do so. Having a "operation" seems very limited, but we'll discover we can define anything we are used to in math and computers in terms of it.

A lambda term can either be:

A variable, like x, y, a, b, or banana
An abstraction, like λx.x, which is written as a Greek letter lambda λ or \ followed by a variable (e.g. x), a dot ., and another lambda term (the variable x in this case). We'll refer to the variable next to the lambda as a captured variable, and the lambda term following the dot as the body of the abstraction.
An application, like ((λx.x) k) or just (a b). An application is written as two consecutive terms, and if the first term happens to be an abstraction we can perform beta reduction

When multiple terms are written in succession, like a b c, it is understood as a sequence of applications with left-associativity, so a b c = ((a b) c) and not (a (b c))

Applications behave like anonymous functions, so f(x) = x (the identity function) could be written in lambda calculus as λx.x. To continue our analogy, beta reduction is like function application, and f(3) = 3 (function evaluation) would be written as the following reduction (λx.x) 3 -> 3. (The arrow is not official).

To "reduce" an application, we substitute the variable that follows the lambda inside the function body with the second term of the application. For instance, to reduce (λx.x x) k we have to substitute every instance of x (the variable to the right of the lambda symbol) inside the abstraction's body, .x x in this case, with the second term k. This becomes just k k which is not very interesting, but think about the following expr and reduction steps:

(λa.a a) (λx.x)
(λx.x) (λx.x)
(λx.x)

We can often perform multiple reductions until a term becomes irreducible. In this example, we substituted a inside a a with the identity (λx.x). Then, we applied the identity to itself, which yields the identity again.

Sometimes we can perform reductions forever:

(λx.x x) (λx.x x)
(λx.x x) (λx.x x)
(λx.x x) (λx.x x)
(λx.x x) (λx.x x)
(λx.x x) (λx.x x)

Trying to reduce this term results in a copy of the original term, which sets us back at square one.

There are many more interesting lambda terms we'll cover later, but this information is enough to write an interpreter.

We won't go too much into alpha renaming, but it suffices to say it's the way we can formally say that λx.x is the same identity as λy.y and we can safely interchange the two. This is like stating that the function f(x) = x*2 and g(y) = y*2 are the same, which is intuitively true but laid down formally in lambda calculus. We'll also have to implement a rudimentary version to properly define substitution.

General Strategy and Constraints

Write a library
Use it in a program to make a repl

We're aiming to implement our lambda calculus interpreter as a library. Even if we have a specific goal in mind let's try not to cut corners and pretend our code will be used for all sorts of lambda calculus manipulation. We'll code as library authors and be our own library consumers in a minimal "main" file that provides a basic REPL (read-eval-print-loop). Follow the same guidelines in your language for extra fun.

Each implementation should provide parsing, pretty printing, and evaluation, and it should be enough to implement afore mentioned REPL.

We'll use standard library (or distribution) as much as possible.

Parsing and Data Modeling

Represent lambda calculus in target language
Parse strings into said representation

We'll start with parsing lambda terms such as λc.c(λx.x), λx.x, y or λa.(a b c) into an abstract syntax tree. We also need to define such a tree.

Lambda terms can contain other lambda terms, and makes tempting to use a recursive abstract syntax tree. This could be encouraged or discouraged depending on your language.

Parsing will be very hard and uninteresting. Our general strategy will be implementing bits of code that parse one of the three lambda terms and then write a backtracking parser: it will attempt to parse each variant calling itself recursively and picking another choice on failue.

Backtracking, recursive descent parsers are easy to reason about and write but can get very slow. If you follow along I suggest doing whatever is simplest to parse your strings into AST nodes.

Usually backtracking parser can be quick if they look ahead a fixed number of tokens, just enough to decide immediately which branch to pick first.

I won't provide a formal grammar for lambda calculus, here's a simple ebnf grammar for lambda calculus according to the University of Maryland:

x ::= any variable name
e ::= x | ( e e ) | ( λ ( x ) e )
v ::= ( λ ( x ) e )

A nice feature we'd love to support that isn't present in this specific grammar is lists of applications, such as (a b c d e) -> ((((a b) c) d) e). You can definitely write lambda calculus without it and it might be challenging to implement depending on how you write parser.

Evaluation

get all free variables in a term
Implement a function to get a fresh variable
implement alpha renaming to substitute a variable
Implement substitution of variable for term inside other term
Implement a function that performs a single beta-reduction
call "reduction" until it can't reduce anything

To "evaluate" a lambda term we just perform reductions until no further work can be done.

We'll implement a function to substitute a variable inside an expression. We'll use this subst function to implement beta reduction, and lastly we'll provide a method that reduces terms until no further reduction can happen (or until we're stuck).

Care must be ensured in the subst function to avoid variables captured by lambda abstractions. Free variables shouldn't become captured variables after a substitution!

This is quite fun and in the implementation we'll have to ponder deeply about how programming languages in general implement "evaluation".

In the next post I will switch to a nameless representation.

Environment, Repl

model and parse declarations
model environment (mapping of string -> term)
implement a function that will resolve variables in a term with your env
read user input with a little prompt
parse, store declarations, resolve variables, eval
write back to user

Lastly, we'll cheat and let the user actually name a lambda term for later, like this: id = λx.x, and write a second evaluator function that takes into account these user defined variables, allowing them to write id 3 and get 3 as expected.

In both my implementation I defined a new data type for declarations and I used some kind of HashMap to model the environment.

If we did great on the first steps this should be a fairly straightforward extension to our current code, and will teach us a little about stateful computation (the state being the user-defined env) when implementing the "resolve" function. Care must be ensured not to substitute captured variables, e.g. \id.id mustn't change even if {id: \x.x} in our env.

The best part about naming lambda expression is that this lets us write massive lambda terms without going insane by just composing together many simple named terms.

Writing an interpreter (Haskell Baseline Version)

Haskell was practically invented to let undergraduates write lambda calculus interpreters, a joke that isn't too far from reality as its main influence, Standard ML (Meta Language), was in fact engineered to write compilers and interpreters.

I also have prior experience with Haskell and know for a fact I was able to write a lambda calculus interpreter with it. I'm sort of Rusty.

Remembering Haskell

Haskell is sadly in a sorry state where the most famous compiler essentially defines the language, as the old Haskell 2010 standard remains the last official reference. The ecosystem, which is entirely built on top of GHC, has evolved and thrived with compiler-specific extension. Any serious and modern Haskell code has required 10+ compiler extensions to work. The most popular extensions were supposed to be included in a Haskell 2020 standard, but interest died out. Contributors wanted.

Sometimes I have troubles looking for official documentation. The standard library has really fancy and clean pages on Hackage, and the official documentation for the language is here, I guess. The standard is 10 years old. I wish someone could point me to something akin to the beautiful rust reference book, where I can get a quick and friendly refresher on the language basics.

AST in Haskell

This is the boring AST we come up with. We're using the T.Text package, which isn't standard library but is widely recommended to use in place of default Strings.

Before Rust, Haskell pioneered the practice of having many confusing string types, since the default string type is a lazy, linked list of single characters. A lazy, linked list of single characters is great for writing streaming, lazy functions, but not so great for memory layout and almost never what you want to use to represent a string. T.Text is a boring utf8 byte array.

Haskell

type Ident = T.Text

data Expr  = Var Ident | Appl Expr Expr | Abs Ident Expr 
    deriving Eq

Haskell does not complain about the recursive nature of our data structure. It has three |-separated variants, the variable, the application, and the abstraction.

Parsing in Haskell

Instead of reaching for a parser library (probably the best idea) we'll try to implement an instance of the Read typeclass. This lets us call the function read :: Read a => String -> a with our custom type as return argument, resolving to our read implementation here.

This led to the following implementation, which I'm happy with despite not being completely sure about.

Haskell

instance Read Expr where
  readPrec = readExpr
  readListPrec = readListPrecDefault

readExpr :: ReadPrec (Expr)
readExpr = readAppl <++ readSingle

readSingle :: ReadPrec (Expr)
readSingle = parens readLambda <++ (fmap Var readIdent) <++ paren readExpr

readAppl :: ReadPrec (Expr)
readAppl =  do
    first <- readSingle
    lift skipSpaces
    items <- lift $ many1 $ do
        x <- readPrec_to_P readSingle 0
        skipSpaces
        return x
    return $ foldl' Appl first items

readLambda :: ReadPrec (Expr)
readLambda = do
    _ <- choice [readChar 'λ', readChar '\\']
    ident <- readIdent
    _ <- readChar '.'
    lift skipSpaces
    rest <- readExpr
    return $ Abs ident rest

readIdent :: ReadPrec T.Text
readIdent = do
    ident <- lift (munch1 isAlphaNum)
    return $ T.pack ident
    
readChar :: Char ->  ReadPrec Char
readChar c = lift (char c)

The suggested way to implement read was using the ReadPrec monad: a function with type ReadPrec (Type) is a parser that returns said type, and you can compose them using do notation and other functions like choice, many, or the <++ operator.

Our exprParse is a composition of simpler and simpler parsers down to readChar and readIdent, which match a single character and an identifier respectively.

All is good expect for readAppl: it's very annoying to create a left-associative chain of operations using a recursive descent parser: we have to parse a b c as ((a b) c), meaning the first term could be "very deep". To fix this, we actually parse a full list of terms, then we foldr on that list to build the final left-associative object.

Inside of do blocks we can also use the var <- readExpr syntax, which means "read an expression and unwrap it or fail with short circuit". A lot of control flow is encoded this way!

There is some confusing interplay with another parsing monad called ReadS, which is mentioned in the standard but is much slower than the efficient and ghc-only ReadPrec. The lift function can be seen translating functions for the old ReadS monad to the new ReadPrec, like skipSpaces. This was kind of confusing and just adds clutter.

It's worth nothing that most Haskell parsing libraries also make use of do notation and monads, providing a very clean and nice interface. I used attoparsec a lot and other parsec clones.

I'm interested in hearing from a seasoned Haskeller what I could've done better here.

Eval in Haskell

Haskell

type VarSet = ST.Set Ident

sub :: Ident -> Expr -> Expr -> Expr 
sub id0 arg (Var id1) | id0 == id1 = arg
sub _   _   (Var id1)  = Var id1
sub id0 _   (Abs id1 expr) | id0 == id1 = Abs id1 expr
sub id0 arg (Abs id1 expr) = Abs newid (sub id0 arg newexpr)
    where (newid, _) = clear id1 arg
          newexpr = sub id1 (Var newid) expr
sub id0 arg0 (Appl expr arg1) = Appl (sub id0 arg0 expr) (sub id0 arg0 arg1)

free :: Expr -> VarSet
free expr = inner ST.empty expr
    where inner :: VarSet -> Expr -> VarSet
          inner bound (Var ident) = if ST.member ident bound then ST.empty else ST.singleton ident
          inner bound (Abs ident expr') = inner (ST.insert ident bound) expr'
          inner bound (Appl f arg) = ST.union (inner bound f) (inner bound arg)

fresh :: VarSet -> Ident -> Ident
fresh vars x = if ST.member new vars then fresh vars new else new
    where new = x <> "'"

clear :: Ident -> Expr -> (Ident, Expr)
clear ident expr = clearinner ident expr
    where 
        clearinner :: Ident -> Expr -> (Ident, Expr)
        clearinner id0 (Var id1) | id1 == id0 = (fresh (free expr) id0, Var (fresh (free expr) id0))
        clearinner id0 (Abs id1 expr') | id0 /= id1 = (newid, Abs newid newexpr)
            where (newid, newexpr) = clearinner id0 expr'
        clearinner id0 (Appl expr' arg) = (newid', Appl newexpr newarg)
            where (newid, newexpr) = clearinner id0 expr'
                  (newid', newarg) = clearinner id0 (sub id0 (Var newid) arg)
        clearinner id0 expr' = (id0, expr')

-- -- normal order
eval :: Expr -> Expr
eval (Appl (Abs ident expr) arg) = sub ident arg expr
eval (Appl expr arg) = Appl (eval expr) arg
eval (Abs v body) = Abs v (eval body)
eval x = x

steps :: Expr -> [Expr]
steps = fix (\rec x -> if x == eval x then [x] else x:(rec $ eval x))

The substitution function is simple: it only needs to avoid captured variables. To avoid capturing otherwise free variables, we need to get the set of free variables first.

This is implemented in the free function returning a Data.Set Ident that I named VarSet with a type alias, not really a wrapper.

The clear function will take an identifier, an expression, and return a possibly fresh identifier and a new expression using the fresh identifier in place of the old one.

Finally we can use out clear function inside subst.

Trying to implement the evaluation function will force us to answer this question: we need to find a suitable application to reduce, but what if there are multiple possible candidates? Here we have two:

((λx.λy.y) ((λx.x x) (λx.x x))) b

We could apply the left-most, outermost reduction by substituting ((λx.x x) (λx.x x)) in place of x inside λy.y. There's no x in λy.y so the argument gets discarded:

((λx.λy.y) ((λx.x x) (λx.x x))) b ->
((λy.y)) b ->
b

Or we could stubbornly apply the innermost, leftmost beta reduction available to us, in the (λx.x x) (λx.x x) bit. As we saw earlier this evaluates to itself, giving us the original expression unchanged.

((λx.λy.y) ((λx.x x) (λx.x x))) b ->
((λx.λy.y) ((λx.x x) (λx.x x))) b ->
((λx.λy.y) ((λx.x x) (λx.x x))) b ->
... no progress

The order in which we pick the lambda term can actually dictate if we'll ever reach a final state or not. Good news! Always picking the leftmost, outermost reduction WILL always take you to the same final term if such a term exist.

Most programming languages tend to evaluate the innermost, leftmost function first: when you call a function, all its arguments are evaluated before the function body. Haskell is one of the few languages where this isn't true and bodies are "substituted" before their arguments.

After the interpreter is done, you'll be able to play around with different evaluation strategies and see how they differ.

The steps function will repeatedly call eval until x == eval x. There's a simpler way to implement this but I admittedly just wanted to use fix here.

There is no function to get the normal form, as steps suffices:

normal :: Expr -> Expr
normal = last . steps

Delcarations and State in Haskell

At this point we can test our functions with GHCi or a simple main, since we have all the components for basic read -> eval -> print.

However, it gets very tedious to write long lambda terms manually. Let's implement a simple dialect of our lambda calculus for interactive use: every line is either an expression to evaluate or a definition.

The syntax for definitions will be simply name = <lambda term>. We define a new data type to reflect this that can either be a declaration or a bare expression, along with Read, Show implementations.

data EnvExpr = Decl Ident Expr | Base Expr
instance Read EnvExpr where
  readPrec = readEnvExpr
  readListPrec = readListPrecDefault

instance Show EnvExpr where
    show (Decl ident expr) = printf "%s = %s" ident (show expr)
    show (Base expr) = show expr

readEnvExpr :: ReadPrec (EnvExpr)
readEnvExpr = readDecl <++ (Base <$> readExpr)

readDecl :: ReadPrec (EnvExpr)
readDecl = do
    ident <- readIdent
    lift skipSpaces
    _ <- readChar '='
    lift skipSpaces
    expr <- readExpr
    return $ Decl ident expr

Now we just need to store definitions as we parse them, and to implement a new function that will replace all variables (like (const id a)) with the terms stored in our env (like {const: \x.\y.x, id = \x.x} ) until we are left with a bare expression again (like ((\x.\y.x) (\x.x) a)).

I modeled the environment using a classic Data.Map from containers. To model stateful computation, I just had to use the State monad.

Haskell

type Env = MP.Map Ident Expr;

readAndStore :: [String] -> [String]
readAndStore = fst . evalAndStore MP.empty
    where evalAndStore = flip inner
          inner = runState . read_store_monad
        
read_store_monad :: [String] -> State Env [String]
read_store_monad []     = 
    return []
read_store_monad (x:xs) = do
    x' <- case ((read x)::EnvExpr) of
        Decl i e -> (put_print i e)
        Base e -> (\s -> show s ++ " -> " ++ (show . last . steps $ s)) <$> eval_env e
    xs'<- read_store_monad xs
    return $ x':xs'


put_print :: Ident -> Expr -> State Env String
put_print i e = do
    env <- get
    new <- last . steps <$> eval_env e
    put (MP.insert i new env)
    pure $ (show $ Decl i e) ++ " -> " ++ show new

eval_env :: Expr -> State Env Expr
eval_env (Var i) = do
    env <- get
    case MP.lookup i env of
        Just _ -> return x
        Nothing -> return $ Var i
eval_env (Appl a b) = do
    a' <- eval_env a;
    b' <- eval_env b;
    return $ Appl a' b'
eval_env (Abs k b) = do
    env <- get
    b' <- case MP.lookup k env of
        Just x -> do
            put (MP.delete k env)
            eval_env b
        Nothing -> eval_env b
    return $ Abs k b'

The State A B monad lets us write functions that return B and have read/write access to a state A. Inside of do notation we can <- get the state or put (<new_state>) as we see fit, and return our desired types.

eval_env does the heavy lifting of inserting new declarations in the environment or removing them temporarily when they're captured by a lambda expression.

put_print inserts and prints in a single operation: when a user types id = \x.x in the repl they'll receive a printout of the declaration again.

read_store_monad performs the fun task of reading the whole input line-by-line and return a list of output lines inside a State monad. Since Haskell lists are lazy, these will actually work as streams if needed:

readAndStore doesn't have the State monad in the signature, instead it takes again a list of lines as input, runs the state monad performing computation, return the output.

Now, because the output of our State monad is a list, Haskell won't bother computing the full list but will actually just try to fetch elements as they are needed. Lazyness is preserved in the state monad and in our functions too, meaning they work with streaming data for free.

IO

Haskell

prompt :: String
prompt = "λ " 

main :: IO ()
main = do
  putStr prompt
  hFlush stdout
  hSetBuffering stdout NoBuffering
  interact $ prompt_inner

prompt_inner :: String -> String
prompt_inner = foldMap (\s -> s ++ ("\n" ++ prompt)) .
  readAndStore .
  filter (\s -> s /= "") .
  lines

In our IO module we invoke readAndStore inside another function which is just String -> String. These two strings are the whole output and whole input of our program. They're also default Haskell strings: a lazy linked list of single characters.

Feeding the function to interact will hook it to stdin -> stdout, and thanks to lazyness it will start producing output as soon as enough input is available to.

Final Haskell Review

Monads are very nice and not just some gimmick. I've used three just in this small example (State, ReadPrec, and IO!) for wildly different context, and do notation makes sense in each context.

The tooling has improved tremendously and is on par with Rust: while stack has always been very mature, easy, and an inspiration for cargo and friends, the HLS (Haskell Language Server) is now working great and out-of-the-box for most projects.

This style of code is still mostly unreadable for most programmers used to other programming paradigms. Could you follow in your language of choice?

Writing an interpreter (Zig version)

Getting started is okay. I used mise to handle zig and zls installation (0.16.0)

Zig has a single page html as language reference and ships the standard library's documentation offline via zig std. I consulted the language reference, the standard library, and my friend's TRGWII for Zig wisdom. I wrote an initial version, then he gave me a lot of stylistic suggestions and sent me a modified parser design. I then took that design and finished the interpreter.

AST in Zig

Zig


const Ident = struct {
    name: []const u8,
    fn deinit(self: Ident, alloc: std.mem.Allocator) void {
        alloc.free(self.name);
    }
    fn clone(self: Ident, alloc: std.mem.Allocator) error{OutOfMemory}!Ident {
        return Ident{ .name = try alloc.dupe(self.name) };
    }
};


const Appl = struct {
    f: *AST,
    a: *AST,
    fn deinit(self: Appl, alloc: std.mem.Allocator) void {
        self.f.deinit(alloc);
        alloc.destroy(self.f);
        self.a.deinit(alloc);
        alloc.destroy(self.a);
    }
};

const Abs = struct {
    v: []const u8,
    e: *AST,
    fn deinit(self: Abs, alloc: std.mem.Allocator) void {
        self.e.deinit(alloc);
        alloc.destroy(self.e);
    }
};

pub const AST = union(enum) {
    ident: Ident,
    appl: Appl,
    abs: Abs,
    /// many methods here

The lack of a string type is frustrating until you actually embrace the []const u8 oriented philosophy, where actually everything is just bytes and you assign meaning to them. Now usually my compiler greatly aids me in this task, in Zig it is once again revealed upon you: so my identifier is once again just a list of numbers. At some point I must've learned that a C char is a number before being a character. No facade here.

Parsing in Zig

I could not find anything helpful in the standard library to make parsers, so I made my own. My initial design was just a recursive descent parser built with "a bunch of functions that take a string and return (Result, how_much_of_input_I_parsed).". TRGWII suggested using a Parser struct which takes an allocator just once and uses it, keeping the position inside the input string as internal state. Fair enough

Zig

pub const Parser = struct {
    src: []const u8,
    i: usize = 0,
    alloc: std.mem.Allocator,

    // many methods and entire AST definition actually

    fn ident(self: *Self) ParseError!Ident {
        if (self.i >= self.src.len) return ParseError.EOF;
        var i = self.i;
        for (self.src[self.i..]) |c| {
            if (std.ascii.isAlphabetic(c) or c == '-' or c == '_') {
                i += 1;
            } else {
                break;
            }
        }
        if (self.i == i) return ParseError.MissingId;
        const name = self.src[self.i..i];
        self.i = i;
        return .{ .name = name };
    }

    fn abs(self: *Self) !Abs {
        const restore = self.i;
        errdefer self.i = restore;
        if (self.src.len <= self.i) return error.EOF;
        if (self.src[self.i] == '\\')
            self.i += 1
        else if (std.mem.startsWith(u8, self.src[self.i..], "λ"))
            self.i += 2
        else
            return error.MissingLambda;
        const id = try self.ident();
        if (self.src.len <= self.i) return error.EOF;
        if (self.src[self.i] != '.') return error.MissingDot;
        self.i += 1;
        const ast = try self.alloc.create(AST);
        errdefer self.alloc.destroy(ast);
        ast.* = try self.parse();
        return .{ .v = id.name, .e = ast };
    }

    // other methods are omitted for brevity

While in the Haskell version this was hidden inside control flow, here we very explicitly backtrack: at the start of our parsing function we store the current i or index in the source string. Using an errdefer block i can be restored to its initial value on error: if the parsing fails, we go back to where we started (maybe to try another variant?)

I thought of my Ast as immutable, owning its own subnodes. My design turned out to be very noisy, with a lot of allocations and free() taking place. I know this is not the best style of Zig.

I really like the inline else pattern. In this case I wrote a .deinit() function for each possible variant. Then I can write this:

Zig

    fn deinit(self: AST, allocator: std.mem.Allocator) void {
        switch (self) {
            inline else => |variant| variant.deinit(allocator),
        }
    }

This will get inlined to a proper switch where we call the .deinit method of every function.

Evaluating in Zig

I then wrote my equivalent functions to perform a single substitution, do a single evaluation step, and reduce until normal form:

I used a std.StringHashMap(void) as a set to represent free variables and implemented free, get_fresh_variant as analogues of free and clear in Haskell.

This is a LOT of code where nothing much happens. Skip ahead.

Zig


pub const VarSet = struct {
    set: std.StringHashMap(void),

    fn join(self: VarSet, other: *const VarSet) error{OutOfMemory}!void {
        var it = other.set.iterator();
        while (it.next()) |entry| {
            var set = self.set;
            try set.put(entry.key_ptr.*, entry.value_ptr.*);
        }
    }

    fn get_fresh_variant(self: VarSet, id: []const u8, allocator: std.mem.Allocator) error{OutOfMemory}![]const u8 {
        var attempt = try allocator.alloc(u8, id.len + 1);
        @memcpy(attempt[0..id.len], id);
        attempt[attempt.len - 1] = '\'';

        while (self.set.contains(attempt)) {
            attempt = try allocator.realloc(attempt, attempt.len + 1);
            attempt[attempt.len - 1] = '\'';
        }

        return attempt;
    }

    fn init(alloc: std.mem.Allocator) VarSet {
        return VarSet{ .set = std.StringHashMap(void).init(alloc) };
    }

    fn deinit(self: VarSet, alloc: std.mem.Allocator) void {
        _ = alloc;
        var set = self.set;
        set.deinit();
    }
};


pub fn subst(self: AST, id: Ident, arg: *const AST, allocator: std.mem.Allocator) error{OutOfMemory}!*AST {
    switch (self) {
        .abs => |a| {
            var cloned_expr = try allocator.create(AST);
            cloned_expr.* = self;
            if (!std.mem.eql(u8, id.name, a.v)) {
                cloned_expr.abs.e = try a.e.subst(id, arg, allocator);
                cloned_expr.abs.v = a.v;
            } else {
                cloned_expr.abs.e = try a.e.clone(allocator);
                cloned_expr.abs.v = a.v;
            }
            return cloned_expr;
        },
        .appl => |app| {
            var cloned_expr = try allocator.create(AST);
            cloned_expr.* = self;
            cloned_expr.appl.f = try app.f.subst(id, arg, allocator);
            cloned_expr.appl.a = try app.a.subst(id, arg, allocator);
            return cloned_expr;
        },
        .ident => |idn| {
            if (std.mem.eql(u8, id.name, idn.name)) {
                return try arg.clone(allocator);
            } else {
                var cloned_expr = try allocator.create(AST);
                cloned_expr.* = self;
                cloned_expr.ident = Ident{ .name = self.ident.name };
                return cloned_expr;
            }
        },
    }
}

const EvalError = error{ MalformedAst, OutOfMemory };
pub fn free(self: AST, allocator: std.mem.Allocator) error{OutOfMemory}!VarSet {
    var bound = VarSet.init(allocator);
    defer {
        bound.deinit(allocator);
    }
    return try self._free_bound(&bound, allocator);
}

fn _free_bound(self: AST, bound: *VarSet, allocator: std.mem.Allocator) error{OutOfMemory}!VarSet {
    return switch (self) {
        .abs => |a| {
            try bound.set.put(a.v, {});
            return try a.e._free_bound(bound, allocator);
        },
        .appl => |app| {
            var free_fun = try app.f._free_bound(bound, allocator);
            errdefer {
                free_fun.deinit(allocator);
            }
            const free_arg = try app.a._free_bound(bound, allocator);
            defer {
                free_arg.deinit(allocator);
            }
            try free_fun.join(&free_arg);

            return free_fun;
        },
        .ident => |id| {
            if (bound.set.contains(id.name)) {
                return VarSet.init(allocator);
            } else {
                var set = VarSet.init(allocator);
                try set.set.put(id.name, {});
                return set;
            }
        },
    };
}

pub fn subst(self: AST, id: Ident, arg: *const AST, allocator: std.mem.Allocator) error{OutOfMemory}!*AST {
    switch (self) {
        .abs => |a| {
            var cloned_expr = try allocator.create(AST);
            errdefer {
                cloned_expr.deinit(allocator);
                allocator.destroy(cloned_expr);
            }
            cloned_expr.* = self;
            if (!std.mem.eql(u8, id.name, a.v)) {
                const free_vars = try arg.free(allocator);
                defer {
                    free_vars.deinit(allocator);
                }
                var body = a.e;
                var fresh_id = a.v;
                var substituted = false;
                if (free_vars.set.contains(a.v)) {
                    fresh_id = try free_vars.get_fresh_variant(a.v, allocator);
                    body = try a.e.subst(Ident{ .name = a.v }, &Parser.AST{ .ident = Ident{ .name = fresh_id } }, allocator);

                    substituted = true;
                }
                defer {
                    if (substituted) {
                        body.deinit(allocator);
                        allocator.destroy(body);
                        allocator.free(fresh_id);
                    }
                }
                cloned_expr.abs.e = try body.subst(id, arg, allocator);
                cloned_expr.abs.v = try allocator.dupe(u8, fresh_id);
            } else {
                cloned_expr.abs.e = try a.e.clone(allocator);
                cloned_expr.abs.v = try allocator.dupe(u8, a.v);
            }
            return cloned_expr;
        },
        .appl => |app| {
            var cloned_expr = try allocator.create(AST);
            errdefer {
                cloned_expr.deinit(allocator);
                allocator.destroy(cloned_expr);
            }
            cloned_expr.* = self;
            cloned_expr.appl.f = try app.f.subst(id, arg, allocator);
            cloned_expr.appl.a = try app.a.subst(id, arg, allocator);
            return cloned_expr;
        },
        .ident => |idn| {
            if (std.mem.eql(u8, id.name, idn.name)) {
                return try arg.clone(allocator);
            } else {
                var cloned_expr = try allocator.create(AST);
                cloned_expr.* = self;
                cloned_expr.ident = Ident{ .name = try allocator.dupe(u8, self.ident.name) };
                return cloned_expr;
            }
        },
    }
}

const EvalError = error{ MalformedAst, OutOfMemory, NotInEnv } || ParseError;

pub fn eval(self: AST, allocator: std.mem.Allocator) EvalError!*AST {

    // fn eval(ast: *const Ast, allocator: std.mem.Allocator) EvalError!*Ast {
    switch (self) {
        .appl => |app| {
            switch (app.f.*) {
                .abs => |a| {
                    return try a.e.subst(.{ .name = a.v }, app.a, allocator);
                },
                else => {
                    var cloned_expr = try allocator.create(AST);
                    errdefer {
                        cloned_expr.deinit(allocator);
                        allocator.destroy(cloned_expr);
                    }
                    cloned_expr.* = self;
                    cloned_expr.appl.a = try app.a.clone(allocator); // clone arg
                    errdefer {
                        cloned_expr.appl.a.deinit(allocator);
                        allocator.destroy(cloned_expr.appl.a);
                    }
                    cloned_expr.appl.f = try app.f.eval(allocator); // eval function body
                    errdefer {
                        cloned_expr.appl.f.deinit(allocator);
                        allocator.destroy(cloned_expr.appl.f);
                    }

                    return cloned_expr;
                },
            }
        },
        .abs => |a| {
            var cloned_expr = try allocator.create(AST);
            errdefer {
                cloned_expr.deinit(allocator);
                allocator.destroy(cloned_expr);
            }
            cloned_expr.* = self;
            cloned_expr.abs.e = try a.e.eval(allocator); // eval function body
            errdefer {
                cloned_expr.abs.e.deinit(allocator);
                allocator.destroy(cloned_expr.abs.e);
            }
            cloned_expr.abs.v = try allocator.dupe(u8, a.v);

            return cloned_expr;
        },
        else => {
            return try self.clone(allocator);
        },
    }
}
pub fn normal(self: *const AST, allocator: std.mem.Allocator) EvalError!*AST {
    var res = try self.eval(allocator);
    var prev = try self.clone(allocator);
    defer {
        prev.deinit(allocator);
        allocator.destroy(prev);
    }
    errdefer {
        res.deinit(allocator);
        allocator.destroy(res);
    }
    while (!res.eql(prev.*)) {
        prev.deinit(allocator);
        allocator.destroy(prev);
        prev = res;
        res = try res.eval(allocator);
    }
    return res;
}

They're very similar to what I did in Haskell, just written in an imperative style and more allocations than I would like.

Environment in Zig

Just like in Haskell, I added a new parser and AST for declarations. This parser also keeps track of its position in a source string with an i variable. You can actually mix and match them with no issues if you create two parsers with the same .src string and keep track of i. Since Zig offers no encapsulation for fields, you can just edit i to your liking. I also integrated the environment inside the parser, which is a bit of a cheat. I used a boring std.StringHashMap and had to create an OwnedAST version of my previous AST to put inside the env. The parser has a resolve() method which takes a lambda term and returns a lambda term with all variables "resolved". To avoid captured variables this time I modify the env, removing and adding the term back when I'm done. This is silly, and a usual technique is to create more sophisticated envs (such as a linked list of environments), but it worked fine for my small zig program.

Zig

pub const EnvParser = struct {
    src: []const u8,
    i: usize = 0,
    alloc: std.mem.Allocator,
    env: std.StringHashMap(Parser.OwnedAST),
    parser: Parser,
    const Self = @This();
    const ParseError = error{MissingEqual} || Parser.ParseError || std.mem.Allocator.Error;
    const EvalError = Parser.AST.EvalError;

    pub fn init(src: []const u8, alloc: std.mem.Allocator) Self {
        return Self{
            .src = src, .i = 0, .alloc = alloc,
            .env = std.StringHashMap(Parser.OwnedAST).init(alloc),
            .parser = Parser{ .alloc = alloc, .src = src, .i = 0 } };
    }

    pub fn deinit(self: *Self) void {
        var iter = self.env.iterator();
        while (iter.next()) |entry| {
            entry.value_ptr.deinit(self.alloc);
            self.alloc.free(entry.key_ptr.*);
        }
        self.env.deinit();
    }

    pub fn loadnew(self: *Self, newline: []const u8) void {
        self.i = 0;
        self.src = newline;
        self.parser.src = newline;
        self.parser.i = 0;
    }

    pub const DEFAST = union(enum) {
        definition: Definition,
        bare: Parser.AST,
        pub fn normal(self: DEFAST, allocator: std.mem.Allocator) EvalError!*Parser.AST {
            switch (self) {
                inline else => |variant| return variant.normal(allocator),
            }
        }
        pub fn format(self: DEFAST, writer: anytype) !void {
            try switch (self) {
                .definition => |def| writer.print("{s} = {f}", .{ def.name, def.exp }),
                .bare => |bare| bare.format(writer),
            };
        }
    };

    pub fn parse(self: *Self) EvalError!DEFAST {
        if (self.definition()) |def| {
            return .{ .definition = def };
        } else |_| {
            self.parser.i = 0; // reset parser
            if (self.parser.parse()) |ast| {
                return .{ .bare = (try self.resolve(&ast)).* };
            } else |err| return err;
        }
    }

    pub fn resolve(self: *Self, expr: *const Parser.AST) EvalError!*Parser.AST {
        switch (expr.*) {
            .ident => |id| {
                if (self.env.contains(id.name)) {
                    const env_expr = self.env.get(id.name) orelse return EvalError.NotInEnv;
                    return env_expr.ast.clone(self.alloc);
                } else {
                    return expr.clone(self.alloc);
                }
            },
            .appl => |a| {
                const cloned_expr = try self.alloc.create(Parser.AST);
                errdefer {
                    self.alloc.destroy(cloned_expr);
                }
                cloned_expr.* = expr.*;
                cloned_expr.appl.f = try self.resolve(a.f);
                cloned_expr.appl.a = try self.resolve(a.a);
                return cloned_expr;
            },
            .abs => |a| {
                const cloned_expr = try self.alloc.create(Parser.AST);
                errdefer {
                    self.alloc.destroy(cloned_expr);
                }
                if (self.env.contains(a.v)) {
                    const cache = self.env.getEntry(a.v) orelse return EvalError.NotInEnv;
                    if (!self.env.remove(a.v)) return EvalError.NotInEnv;

                    cloned_expr.* = expr.*;
                    cloned_expr.abs.v = a.v;
                    cloned_expr.abs.e = try self.resolve(a.e);
                    try self.env.put(cache.key_ptr.*, cache.value_ptr.*);
                } else {
                    cloned_expr.* = expr.*;
                    cloned_expr.abs.v = a.v;
                    cloned_expr.abs.e = try self.resolve(a.e);
                }
                return cloned_expr;
            },
        }
    }

    pub fn resolve(self: *Self, expr: *const Parser.AST) EvalError!*Parser.AST {
        switch (expr.*) {
            .ident => |id| {
                if (self.env.contains(id.name)) {
                    const env_expr = self.env.get(id.name) orelse return EvalError.NotInEnv;
                    return env_expr.ast.clone(self.alloc);
                } else {
                    return expr.clone(self.alloc);
                }
            },
            .appl => |a| {
                const cloned_expr = try self.alloc.create(Parser.AST);
                errdefer {
                    self.alloc.destroy(cloned_expr);
                }
                cloned_expr.* = expr.*;
                cloned_expr.appl.f = try self.resolve(a.f);
                cloned_expr.appl.a = try self.resolve(a.a);
                return cloned_expr;
            },
            .abs => |a| {
                const cloned_expr = try self.alloc.create(Parser.AST);
                errdefer {
                    self.alloc.destroy(cloned_expr);
                }
                if (self.env.contains(a.v)) {
                    const cache = self.env.getEntry(a.v) orelse return EvalError.NotInEnv;
                    if (!self.env.remove(a.v)) return EvalError.NotInEnv;

                    cloned_expr.* = expr.*;
                    cloned_expr.abs.v = a.v;
                    cloned_expr.abs.e = try self.resolve(a.e);
                    try self.env.put(cache.key_ptr.*, cache.value_ptr.*);
                } else {
                    cloned_expr.* = expr.*;
                    cloned_expr.abs.v = a.v;
                    cloned_expr.abs.e = try self.resolve(a.e);
                }
                return cloned_expr;
            },
        }
    }

Repl in zig

In the main we create an allocator to give to the library. We don't really care about freeing objects since we are using the amazing area allocator, which will free everything in one swoop with area.deinit();.

We also set up a lot of buffers to invoke the new and confusing IO interface, opening stdin() and stdout() manually and reading until newline for user input. Then parse, eval, print.

Zig

const std = @import("std");
const lambda = @import("root.zig");

test {
    std.testing.refAllDecls(@This());
}

const prompt = "λ ";

pub fn main(init: std.process.Init) !void {
    var arena = std.heap.ArenaAllocator.init(init.gpa);
    const io = init.io;

    defer arena.deinit();
    const allocator = arena.allocator();
    var stdin_buf: [1024]u8 = undefined;
    var stdout_buf: [1024]u8 = undefined;
    var stdin = std.Io.File.stdin().reader(io, &stdin_buf);
    var stdout = std.Io.File.stdout().writer(io, &stdout_buf);
    var parser = lambda.EnvParser.init("", allocator);
    try stdout.interface.writeAll(prompt);
    try stdout.interface.flush();
    while (true) {
        var line_buffer: [1024]u8 = undefined;
        var w: std.Io.Writer = .fixed(&line_buffer);

        const line_length = try stdin.interface.streamDelimiterLimit(&w, '\n', .unlimited);
        stdin.interface.toss(1);

        const input_line = line_buffer[0..line_length];
        parser.loadnew(input_line);
        var parsed = try parser.parse();
        try stdout.interface.writeAll(try std.fmt.allocPrint(allocator, "{f} -> {f}\n{s}", .{ parsed, try parsed.normal(allocator), prompt }));
        try stdout.interface.flush();
    }
}

Testing in Zig

The Zig version took at least a new test for every new piece of code I wrote. After writing tests your code gets run and you get a chance of catching memory leaks and other common memory bugs. If you don't write tests it's very much possible that your code won't even compile when you begin using it, as unused code is entirely discarded in the compilation process. This also means your Zig code is as good as your tests are, which is true for a lot of languages. I don't think I would be able to write Zig without writing 1 line of tests for each line of code (or whatever it takes to reach 100% coverage of my code)

Overall Zig Review

I thought I'd have a much harder time than expected, but once you accept Zig's hubris and design it's a very powerful and self-contained language! I have not even tested the integrated fuzzer yet. I feel like I could be more productive in Zig after this experiment rather than C, where you get the same problems but awkward tools to solve them.

The program I picked was intentionally adverse to Zig's imperative style and very easy in Haskell, and despite this I really enjoyed the experience. I'd like to refactor the whole Zig implementation to a pointer-free one (or something more Zigg-y). Unsurprisingly inefficiencies are much more evident than in the Haskell version.

Next Up

This was very fun and a lot could be improved. Please contact me if you have strong opinions about anything I said.

In the next episodes we will:

Probably compute some fun lambda terms
Compare the two implementations for speed, memory, all that stuff
Improve the Zigginess of the Zig implementation
Add new features