You should definitely lay down your new, shiniest tool that will unlock infinite productivity and learn something just for yourself, for fun, slowly, clumsily. You'll be left with something that can't be ever taken away: experience!
I wanted to learn Zig, so I decided I would write a lambda calculus interpreter with just the standard library. I also wrote a Haskell implementation acting as a baseline; Haskell is an entirely different class of language and it'll be fun to talk about the differences in implementation.
You can follow along in any language and try to write your own lambda calculus interpreter. I'll first highlight the general architecture used the Haskell implementation and then present how the same steps were implemented in Zig.
And a note: I apologize for formatting and code examples. Following from source might be easier.
You can skip the next section if you know what lambda calculus is.
Lambda Calculus was invented in the '30 by logician Alonzo Church as he was attempting to improve on Russel's type theoretical foundation of mathematics.
Many notations for function substitution were used before, but Lambda Calculus was the first one where substitution and alpha conversion were formalized.
In essence, lambda calculus defines "lambda terms" which can be "reduced" or computed, and there is a single computational rule. Having a single computational rule seems very limited, but we'll discover we can define anything in terms of this single operation.
A lambda term can either be:
x, y, a, b, or bananaλx.x, which is written as a Greek letter lambda λ or \ followed by a variable (e.g. x), a dot ., and another lambda term (the variable x in this case). We'll refer to the variable next to the lambda as a captured variable, and the lambda term following the dot as the body of the abstraction.((λx.x) k) or just (a b). An application is written as two consecutive terms, and if the first term happens to be an abstraction we can perform beta reductionWhen multiple terms are written in succession, like a b c, it is understood as a sequence of applications with left-associativity, so a b c = ((a b) c) and not (a (b c))
Applications behave like anonymous functions, so f(x) = x (the identity function) could be written in lambda calculus as λx.x.
To continue our analogy, beta reduction is like function application, and f(3) = 3 (function evaluation) would be written as the following reduction (λx.x) 3 -> 3. (The arrow is not official).
To "reduce" an application, we substitute the variable that follows the lambda inside the function body with the second term of the application. For instance, to reduce (λx.x x) k we have to substitute every instance of x (the variable to the right of the lambda symbol) inside the abstraction's body, .x x in this case, with the second term k. This becomes just k k which is not very interesting, but think about the following expr and reduction steps:
(λa.a a) (λx.x)
(λx.x) (λx.x)
(λx.x)
We can often perform multiple reductions until a term becomes irreducible. In this example, we substituted a inside a a with the identity (λx.x). Then, we applied the identity to itself, which yields the identity again.
Sometimes we can perform reductions forever:
(λx.x x) (λx.x x)
(λx.x x) (λx.x x)
(λx.x x) (λx.x x)
(λx.x x) (λx.x x)
(λx.x x) (λx.x x)
Trying to reduce this term results in a copy of the original term, which sets us back at square one.
There are many more interesting lambda terms we'll cover later, but this information is enough to write an interpreter.
We won't go too much into alpha renaming, but it suffices to say it's the way we can formally say that λx.x is the same identity as λy.y and we can safely interchange the two. This is like stating that the function f(x) = x*2 and g(y) = y*2 are the same, which is intuitively true but laid down formally in lambda calculus.
We're aiming to implement our lambda calculus interpreter as a library whose functionality can be used separately. We'll code as library authors and be our own library consumers in a minimal "main" file that provides a basic REPL (read-eval-print-loop). Follow the same guidelines in your language for extra fun.
Each implementation should provide parsing, pretty printing, and evaluation, and it should be enough to implement afore mentioned REPL.
We'll use standard library (or distribution) only, or as much as possible to write "idiomatic" code.
We'll start with parsing lambda terms such as λc.c(λx.x), λx.x, y or λa.(a b c) into an abstract syntax tree. We also need to define such a tree.
Lambda terms can contain other lambda terms, and makes tempting to use a recursive abstract syntax tree. This could be encouraged or discouraged depending on your language.
Parsing will be very hard and uninteresting. Our general strategy will be implementing bits of code that parse one of the three lambda terms and then write a backtracking parser: it will attempt to parse each variant calling itself recursively and picking another choice on failue.
Backtracking, recursive descent parsers are easy to reason about and write but can get very slow. If you follow along I suggest doing whatever is simplest to parse your strings into AST nodes.
I won't provide a formal grammar for lambda calculus, here's a simple ebnf grammar for lambda calculus according to the University of Maryland:
x ::= any variable name
e ::= x | ( e e ) | ( λ ( x ) e )
v ::= ( λ ( x ) e )
A nice feature we'd love to support that isn't present in this specific grammar is lists of applications, such as (a b c d e) -> ((((a b) c) d) e). You can definitely write lambda calculus without it and it might be challenging to implement depending on how you write parser.
We'll implement a function to substitute a variable inside an expression. We'll use this subst function to implement beta reduction, and lastly we'll provide a method that reduces terms until no further reduction can happen (or until we're stuck).
Care must be ensured in the subst function to avoid variables captured by lambda abstractions.
This is quite fun and in the implementation we'll have to ponder deeply about how programming languages in general implement "evaluation".
Lastly, we'll cheat and let the user actually name a lambda term for later, like this: id = λx.x, and write a second evaluator function that takes into account these user defined variables, allowing them to write id 3 and get 3 as expected.
In both my implementation I defined a new data type for declarations and I used some kind of HashMap to model the environment.
If we did great on the first steps this should be a fairly straightforward extension to our current code, and will teach us a little about stateful computation (the state being the user-defined env) when implementing the "resolve" function.
Care must be ensured not to substitute captured variables, e.g. \id.id mustn't change even if {id: \x.x} in our env.
The best part about naming lambda expression is that this lets us write massive lambda terms without going insane by just composing together many simple named terms.
Haskell was practically invented to let undergraduates write lambda calculus interpreters, a joke that isn't too far from reality as its main influence, Standard ML (Meta Language), was in fact engineered to write compilers and interpreters.
I also have prior experience with Haskell and know for a fact I was able to write a lambda calculus interpreter with it. I'm sort of Rusty.
Haskell is sadly in a sorry state where the most famous compiler essentially defines the language, as the old Haskell 2010 standard remains the last official reference.
The ecosystem, which is entirely built on top of GHC, has evolved and thrived with compiler-specific extension. Any serious and modern Haskell code has required 10+ compiler extensions to work. The most popular extensions were supposed to be included in a Haskell 2020 standard, but interest died out. Contributors wanted.
Sometimes I have troubles looking for official documentation. The standard library has really fancy and clean pages on Hackage, and the official documentation for the language is here, I guess. The standard is 10 years old. I wish someone could point me to something akin to the beautiful rust reference book, where I can get a quick and friendly refresher on the language basics.
This is the boring AST we come up with. We're using the T.Text package, which isn't standard library but is widely recommended to use in place of default Strings.
Before Rust, Haskell pioneered the practice of having many confusing string types, since the default string type is a lazy, linked list of single characters. A lazy, linked list of single characters is great for writing streaming, lazy functions, but not so great for memory layout and almost never what you want to use to represent a string.
T.Text is a boring utf8 byte array.
type Ident = T.Text
data Expr = Var Ident | Appl Expr Expr | Abs Ident Expr
deriving Eq
Haskell does not complain about the recursive nature of our data structure. It has three |-separated variants,
the variable, the application, and the abstraction.
Instead of reaching for a parser library (probably the best idea) we'll try to implement
an instance of the Read typeclass. This lets us call the function read :: Read a => String -> a with our custom type as return argument, resolving to our read implementation here.
This led to the following implementation, which I'm happy with despite not being completely sure about.
instance Read Expr where
readPrec = readExpr
readListPrec = readListPrecDefault
readExpr :: ReadPrec (Expr)
readExpr = readAppl <++ readSingle
readSingle :: ReadPrec (Expr)
readSingle = parens readLambda <++ (fmap Var readIdent) <++ paren readExpr
readAppl :: ReadPrec (Expr)
readAppl = do
first <- readSingle
lift skipSpaces
items <- lift $ many1 $ do
x <- readPrec_to_P readSingle 0
skipSpaces
return x
return $ foldl' Appl first items
readLambda :: ReadPrec (Expr)
readLambda = do
_ <- choice [readChar 'λ', readChar '\\']
ident <- readIdent
_ <- readChar '.'
lift skipSpaces
rest <- readExpr
return $ Abs ident rest
readIdent :: ReadPrec T.Text
readIdent = do
ident <- lift (munch1 isAlphaNum)
return $ T.pack ident
readChar :: Char -> ReadPrec Char
readChar c = lift (char c)
The suggested way to implement read was using the ReadPrec monad: a function with type ReadPrec (Type) is a parser that returns said type, and you can compose them using do notation and other functions like choice, many, or the <++ operator.
Our exprParse is a composition of simpler and simpler parsers down to readChar and readIdent, which match a single character and an identifier respectively.
All is good expect for readAppl: it's very annoying to create a left-associative chain of operations using a recursive descent parser: we have to parse a b c as ((a b) c), meaning the first term could be "very deep". To fix this, we actually parse a full list of terms, then we foldr on that list to build the final left-associative object.
Inside of do blocks we can also use the var <- readExpr syntax, which means "read an expression and unwrap it or fail with short circuit". A lot of control flow is encoded this way!
There is some confusing interplay with another parsing monad called ReadS, which is mentioned in the standard but is much slower than the efficient and ghc-only ReadPrec.
The lift function can be seen translating functions for the old ReadS monad to the new ReadPrec, like skipSpaces. This was kind of confusing and just adds clutter.
It's worth nothing that most Haskell parsing libraries also make use of do notation and monads, providing a very clean and nice interface. I used attoparsec a lot and other parsec clones.
I'm interested in hearing from a seasoned Haskeller what I could've done better here.
sub :: Ident -> Expr -> Expr -> Expr
sub id0 arg (Var id1) | id0 == id1 = arg
sub _ _ (Var id1) = Var id1
sub id0 _ (Abs id1 expr) | id0 == id1 = Abs id1 expr
sub id0 arg (Abs id1 expr) = Abs id1 (sub id0 arg expr)
sub id0 arg0 (Appl expr arg1) = Appl (sub id0 arg0 expr) (sub id0 arg0 arg1)
-- -- normal order
eval :: Expr -> Expr
eval (Appl (Abs ident expr) arg) = sub ident arg expr
eval (Appl expr arg) = Appl (eval expr) arg
eval x = x
steps :: Expr -> [Expr]
steps = fix (\rec x -> if x == eval x then [x] else x:(rec $ eval x))
The substitution function is simple: it only needs to avoid captured variables.
When implementing the evaluation function, a question naturally arises: we need to find a suitable application to reduce, But what if there are multiple possible candidates? In the following term we have two:
((λx.λy.y) ((λx.x x) (λx.x x))) b
We could apply the left-most, outermost reduction by substituting ((λx.x x) (λx.x x)) in place of x inside λy.y. There's no x in here to this gets discarded:
((λx.λy.y) ((λx.x x) (λx.x x))) b ->
((λy.y)) b ->
b
Or we could stubbornly apply the innermost, leftmost beta reduction available to us, in the (λx.x x) (λx.x x) bit. As we saw earlier this evaluates to itself, giving us the original expression unchanged.
((λx.λy.y) ((λx.x x) (λx.x x))) b ->
((λx.λy.y) ((λx.x x) (λx.x x))) b ->
((λx.λy.y) ((λx.x x) (λx.x x))) b ->
... no progress
The order in which we pick the lambda term can actually dictate if we'll ever reach a final state or not. Good news! Always picking the leftmost, outermost reduction WILL always take you to the same final term if such a term exist.
Most programming languages tend to evaluate the innermost, leftmost function first: when you call a function, all its arguments are evaluated before the function body. Haskell is one of the few languages where this isn't true and bodies are "substituted" before their arguments.
After the interpreter is done, you'll be able to play around with different evaluation strategies and see how they differ.
The steps function will repeatedly call eval until x == eval x. There's a simpler way to implement this but I admittedly just wanted to use fix here.
There is no function to get the normal form, as steps suffices:
normal :: Expr -> Expr
normal = last . steps
At this point we can test our functions with GHCi or a simple main, since we have all the components for basic read -> eval -> print.
However, it gets very tedious to write long lambda terms manually. Let's implement a simple dialect of our lambda calculus for interactive use: every line is either an expression to evaluate or a definition.
The syntax for definitions will be simply name = <lambda term>. We define a new data type to reflect this that can either be a declaration or a bare expression, along with Read, Show implementations.
data EnvExpr = Decl Ident Expr | Base Expr
instance Read EnvExpr where
readPrec = readEnvExpr
readListPrec = readListPrecDefault
instance Show EnvExpr where
show (Decl ident expr) = printf "%s = %s" ident (show expr)
show (Base expr) = show expr
readEnvExpr :: ReadPrec (EnvExpr)
readEnvExpr = readDecl <++ (Base <$> readExpr)
readDecl :: ReadPrec (EnvExpr)
readDecl = do
ident <- readIdent
lift skipSpaces
_ <- readChar '='
lift skipSpaces
expr <- readExpr
return $ Decl ident expr
Now we just need to store definitions as we parse them, and to implement a new function that will replace all variables (like (const id a)) with the terms stored in our env (like {const: \x.\y.x, id = \x.x} ) until we are left with a bare expression again (like ((\x.\y.x) (\x.x) a)).
I modeled the environment using a classic Data.Map from containers. To model stateful computation, I just had to use the State monad.
type Env = Map Ident Expr;
readAndStore :: [String] -> [String]
readAndStore = fst . evalAndStore empty
where evalAndStore = flip inner
inner = runState . read_store_monad
read_store_monad :: [String] -> State Env [String]
read_store_monad [] =
return []
read_store_monad (!x:xs) = do
!x' <- case ((read x)::EnvExpr) of
Decl i e -> (\s -> s ++ " -> " ++ (show . last .steps $ e)) <$> put_print i e
Base e -> (\s -> show s ++ " -> " ++ (show . last . steps $ s)) <$> eval_env e
xs'<- read_store_monad xs
return $ x':xs'
put_print :: Ident -> Expr -> State Env String
put_print i e = do
env <- get
put (insert i e env)
pure $ show $ Decl i e
eval_env :: Expr -> State Env Expr
eval_env (Var i) = do
env <- get
case MP.lookup i env of
Just x -> return x
Nothing -> return $ Var i
eval_env (Appl a b) = do
a' <- eval_env a;
b' <- eval_env b;
return $ Appl a' b'
eval_env (Abs k b) = do
env <- get
b' <- case MP.lookup k env of
Just x -> do
put (delete k env)
eval_env x
Nothing -> eval_env b
return $ Abs k b'
The State A B monad lets us write functions that return B and have read/write access to a state A.
Inside of do notation we can <- get the state or put (<new_state>) as we see fit, and return our desired types.
eval_env does the heavy lifting of inserting new declarations in the environment or removing them temporarily when they're captured by a lambda expression.
put_print inserts and prints in a single operation: when a user types id = \x.x in the repl they'll receive a printout of the declaration again.
read_store_monad performs the fun task of reading the whole input line-by-line and return a list of output lines inside a State monad. Since Haskell lists are lazy, these will actually work as streams if needed:
readAndStore doesn't have the State monad in the signature, instead it takes again a list of lines as input, runs the state monad performing computation, return the output.
Now, because the output of our State monad is a list, Haskell won't bother computing the full list but will actually just try to fetch elements as they are needed. Lazyness is preserved in the state monad and in our functions too, meaning they work with streaming data for free.
prompt :: String
prompt = "λ "
main :: IO ()
main = do
putStr prompt
hFlush stdout
hSetBuffering stdout NoBuffering
interact $ prompt_inner
prompt_inner :: String -> String
prompt_inner = foldMap (\s -> s ++ ("\n" ++ prompt)) . readAndStore . lines
In our IO module we invoke readAndStore inside another function which is just String -> String. These two strings are the whole output and whole input of our program. They're also default Haskell strings: a lazy linked list of single characters.
Feeding the function to interact will hook it to stdin -> stdout, and thanks to lazyness it will start producing output as soon as enough input is available to.
Monads are very nice and not just some gimmick. I've used three just in this small example (State, ReadPrec, and IO!) for wildly different context, and do notation makes sense in each context.
The tooling has improved tremendously and is on par with Rust: while stack has always been very mature, easy, and an inspiration for cargo and friends, the HLS (Haskell Language Server) is now working great and out-of-the-box for most projects.
This style of code is still mostly unreadable for most programmers used to other programming paradigms. Could you follow in your language of choice?
Getting started is okay. I used mise to handle zig and zls installation (0.16.0)
Zig has a single page html as language reference and ships the standard library's documentation offline via zig std.
I consulted the language reference, the standard library, and my friend's TRGWII for Zig wisdom. I wrote an initial version, then he gave me a lot of stylistic suggestions and sent me a modified parser design. I then took that design and finished the interpreter.
const Ident = struct {
name: []const u8,
fn deinit(_: Ident, _: std.mem.Allocator) void {}
};
const Appl = struct {
f: *AST,
a: *AST,
fn deinit(self: Appl, alloc: std.mem.Allocator) void {
self.f.deinit(alloc);
alloc.destroy(self.f);
self.a.deinit(alloc);
alloc.destroy(self.a);
}
};
const Abs = struct {
v: []const u8,
e: *AST,
fn deinit(self: Abs, alloc: std.mem.Allocator) void {
self.e.deinit(alloc);
alloc.destroy(self.e);
}
};
pub const AST = union(enum) {
ident: Ident,
appl: Appl,
abs: Abs,
/// many methods here
The lack of a string type is frustrating until you actually embrace the []const u8 oriented philosophy, where actually everything is just bytes and you assign meaning to them.
Now usually my compiler greatly aids me in this task, in Zig it is once again revealed upon you: so my identifier is once again just a list of numbers.
At some point I must've learned that a C char is a number before being a character. No facade here.
I could not find anything helpful in the standard library to make parsers, so I made my own. My initial design was just a recursive descent parser built with "a bunch of functions that take a string and return (Result, how_much_of_input_I_parsed).". TRGWII suggested using a Parser struct which takes an allocator just once and uses it, keeping the position inside the input string as internal state. Fair enough
pub const Parser = struct {
src: []const u8,
i: usize = 0,
alloc: std.mem.Allocator,
// many methods and entire AST definition actually
fn ident(self: *Self) ParseError!Ident {
if (self.i >= self.src.len) return ParseError.EOF;
var i = self.i;
for (self.src[self.i..]) |c| {
if (std.ascii.isAlphabetic(c) or c == '-' or c == '_') {
i += 1;
} else {
break;
}
}
if (self.i == i) return ParseError.MissingId;
const name = self.src[self.i..i];
self.i = i;
return .{ .name = name };
}
fn abs(self: *Self) !Abs {
const restore = self.i;
errdefer self.i = restore;
if (self.src.len <= self.i) return error.EOF;
if (self.src[self.i] == '\\')
self.i += 1
else if (std.mem.startsWith(u8, self.src[self.i..], "λ"))
self.i += 2
else
return error.MissingLambda;
const id = try self.ident();
if (self.src.len <= self.i) return error.EOF;
if (self.src[self.i] != '.') return error.MissingDot;
self.i += 1;
const ast = try self.alloc.create(AST);
errdefer self.alloc.destroy(ast);
ast.* = try self.parse();
return .{ .v = id.name, .e = ast };
}
// other methods are omitted for brevity
While in the Haskell version this was hidden inside control flow, here we very explicitly backtrack: at the start of our parsing function we store the current i or index in the source string. Using an errdefer block i can be restored to its initial value on error: if the parsing fails, we go back to where we started (maybe to try another variant?)
I thought of my Ast as immutable, owning its own subnodes. My design turned out to be very noisy, with a lot of allocations and free() taking place. I know this is not the best style of Zig. The strings in identifiers are not owned and belong to the input string, so they are not freed or handled by the parser.
In fact, the .deinit method for identifiers kind doesn't do anything.
This is the proper way in Zig to do a switch on variants: implement the ".deinit" method or anything for each variant, then write something like this:
fn deinit(self: AST, allocator: std.mem.Allocator) void {
switch (self) {
inline else => |variant| variant.deinit(allocator),
}
}
This will get inlined to a proper switch where we call the .deinit method of every function.
I then wrote my equivalent functions to perform a single substitution, do a single evaluation step, and reduce until normal form:
pub fn subst(self: AST, id: Ident, arg: *const AST, allocator: std.mem.Allocator) error{OutOfMemory}!*AST {
// std.debug.print("I put {f} in {f} in place of {s}\n", .{ arg, self, id.name });
switch (self) {
.abs => |a| {
var cloned_expr = try allocator.create(AST);
cloned_expr.* = self;
if (!std.mem.eql(u8, id.name, a.v)) {
cloned_expr.abs.e = try a.e.subst(id, arg, allocator);
cloned_expr.abs.v = a.v;
} else {
cloned_expr.abs.e = try a.e.clone(allocator);
cloned_expr.abs.v = a.v;
}
return cloned_expr;
},
.appl => |app| {
var cloned_expr = try allocator.create(AST);
cloned_expr.* = self;
cloned_expr.appl.f = try app.f.subst(id, arg, allocator);
cloned_expr.appl.a = try app.a.subst(id, arg, allocator);
return cloned_expr;
},
.ident => |idn| {
if (std.mem.eql(u8, id.name, idn.name)) {
return try arg.clone(allocator);
} else {
var cloned_expr = try allocator.create(AST);
cloned_expr.* = self;
cloned_expr.ident = Ident{ .name = self.ident.name };
return cloned_expr;
}
},
}
}
const EvalError = error{ MalformedAst, OutOfMemory };
pub fn eval(self: AST, allocator: std.mem.Allocator) EvalError!*AST {
switch (self) {
.appl => |app| {
switch (app.f.*) {
.abs => |a| {
return try a.e.subst(.{ .name = a.v }, app.a, allocator);
},
else => {
var cloned_expr = try allocator.create(AST);
cloned_expr.* = self;
cloned_expr.appl.a = try app.a.clone(allocator); // deep clone arg
cloned_expr.appl.f = try app.f.eval(allocator); // eval function body
return cloned_expr;
},
}
},
.abs => |a| {
var cloned_expr = try allocator.create(AST);
cloned_expr.* = self;
cloned_expr.abs.e = try a.e.eval(allocator); // eval function body
cloned_expr.abs.v = a.v;
return cloned_expr;
},
else => {
return try self.clone(allocator);
},
}
}
pub fn normal(self: *AST, allocator: std.mem.Allocator) EvalError!*AST {
var res = try self.eval(allocator);
var prev = try self.clone(allocator);
while (!res.eql(prev.*)) {
prev.deinit(allocator);
prev = res;
res = try res.eval(allocator);
}
prev.deinit(allocator);
return res;
}
They're very similar to what I did in Haskell, just written in an imperative style and more allocations than I would like.
Just like in Haskell, I added a new parser and AST for declarations. This parser also keeps track of its position in a source string with an i variable. You can actually mix and match them with no issues if you create two parsers with the same .src string and keep track of i. Since Zig offers no encapsulation for fields, you can just edit i to your liking.
I also integrated the environment inside the parser, which is a bit of a cheat. I used a boring std.StringHashMap and had to create an OwnedAST version of my previous AST to put inside the env.
The parser has a resolve() method which takes a lambda term and returns a lambda term with all variables "resolved". To avoid captured variables this time I modify the env, removing and adding the term back when I'm done.
This is silly, and a usual technique is to create more sophisticated envs (such as a linked list of environments), but it worked fine for my small zig program.
pub const EnvParser = struct {
src: []const u8,
i: usize = 0,
alloc: std.mem.Allocator,
env: std.StringHashMap(Parser.OwnedAST),
parser: Parser,
const Self = @This();
const ParseError = error{MissingEqual} || Parser.ParseError || std.mem.Allocator.Error;
const EvalError = Parser.AST.EvalError;
pub fn init(src: []const u8, alloc: std.mem.Allocator) Self {
return Self{ .src = src, .i = 0, .alloc = alloc, .env = std.StringHashMap(Parser.OwnedAST).init(alloc), .parser = Parser{ .alloc = alloc, .src = src, .i = 0 } };
}
pub fn deinit(self: *Self) void {
var iter = self.env.iterator();
while (iter.next()) |entry| {
entry.value_ptr.deinit(self.alloc);
self.alloc.free(entry.key_ptr.*);
}
self.env.deinit();
}
pub fn loadnew(self: *Self, newline: []const u8) void {
self.i = 0;
self.src = newline;
self.parser.src = newline;
self.parser.i = 0;
}
pub const DEFAST = union(enum) {
definition: Definition,
bare: Parser.AST,
pub fn normal(self: DEFAST, allocator: std.mem.Allocator) EvalError!*Parser.AST {
switch (self) {
inline else => |variant| return variant.normal(allocator),
}
}
pub fn format(self: DEFAST, writer: anytype) !void {
try switch (self) {
.definition => |def| writer.print("{s} = {f}", .{ def.name, def.exp }),
.bare => |bare| bare.format(writer),
};
}
};
pub fn parse(self: *Self) EvalError!DEFAST {
if (lambda_debug) {
depth += 1;
std.debug.print(".{s} {s}: with {s} and s {d}\n", .{ depth_str[0 .. depth * 2], @src().fn_name, self.src[self.i..], self.i });
}
defer if (lambda_debug) {
depth -= 1;
};
if (self.definition()) |def| {
return .{ .definition = def };
} else |_| {
self.parser.i = 0; // reset parser
if (self.parser.parse()) |ast| {
return .{ .bare = (try self.resolve(&ast)).* };
} else |err| return err;
}
}
pub fn resolve(self: *Self, expr: *const Parser.AST) EvalError!*Parser.AST {
switch (expr.*) {
.ident => |id| {
if (self.env.contains(id.name)) {
const env_expr = self.env.get(id.name) orelse return EvalError.NotInEnv;
return env_expr.ast.clone(self.alloc);
} else {
return expr.clone(self.alloc);
}
},
.appl => |a| {
const cloned_expr = try self.alloc.create(Parser.AST);
errdefer {
self.alloc.destroy(cloned_expr);
}
cloned_expr.* = expr.*;
cloned_expr.appl.f = try self.resolve(a.f);
cloned_expr.appl.a = try self.resolve(a.a);
return cloned_expr;
},
.abs => |a| {
const cloned_expr = try self.alloc.create(Parser.AST);
errdefer {
self.alloc.destroy(cloned_expr);
}
if (self.env.contains(a.v)) {
const cache = self.env.getEntry(a.v) orelse return EvalError.NotInEnv;
if (!self.env.remove(a.v)) return EvalError.NotInEnv;
cloned_expr.* = expr.*;
cloned_expr.abs.v = a.v;
cloned_expr.abs.e = try self.resolve(a.e);
try self.env.put(cache.key_ptr.*, cache.value_ptr.*);
} else {
cloned_expr.* = expr.*;
cloned_expr.abs.v = a.v;
cloned_expr.abs.e = try self.resolve(a.e);
}
return cloned_expr;
},
}
}
pub fn resolve(self: *Self, expr: *const Parser.AST) EvalError!*Parser.AST {
switch (expr.*) {
.ident => |id| {
if (self.env.contains(id.name)) {
const env_expr = self.env.get(id.name) orelse return EvalError.NotInEnv;
return env_expr.ast.clone(self.alloc);
} else {
return expr.clone(self.alloc);
}
},
.appl => |a| {
const cloned_expr = try self.alloc.create(Parser.AST);
errdefer {
self.alloc.destroy(cloned_expr);
}
cloned_expr.* = expr.*;
cloned_expr.appl.f = try self.resolve(a.f);
cloned_expr.appl.a = try self.resolve(a.a);
return cloned_expr;
},
.abs => |a| {
const cloned_expr = try self.alloc.create(Parser.AST);
errdefer {
self.alloc.destroy(cloned_expr);
}
if (self.env.contains(a.v)) {
const cache = self.env.getEntry(a.v) orelse return EvalError.NotInEnv;
if (!self.env.remove(a.v)) return EvalError.NotInEnv;
cloned_expr.* = expr.*;
cloned_expr.abs.v = a.v;
cloned_expr.abs.e = try self.resolve(a.e);
try self.env.put(cache.key_ptr.*, cache.value_ptr.*);
} else {
cloned_expr.* = expr.*;
cloned_expr.abs.v = a.v;
cloned_expr.abs.e = try self.resolve(a.e);
}
return cloned_expr;
},
}
}
In the main we create an allocator to give to the library. We don't really care about freeing objects since we are using the amazing area allocator, which will free everything in one swoop with area.deinit();.
We also set up a lot of buffers to invoke the new and confusing IO interface, opening stdin() and stdout() manually and reading until newline for user input. Then parse, eval, print.
const std = @import("std");
const lambda = @import("root.zig");
test {
std.testing.refAllDecls(@This());
}
const prompt = "λ ";
const lambda_debug = true;
pub fn main(init: std.process.Init) !void {
var arena = std.heap.ArenaAllocator.init(init.gpa);
const io = init.io;
defer arena.deinit();
const allocator = arena.allocator();
var stdin_buf: [1024]u8 = undefined;
var stdout_buf: [1024]u8 = undefined;
var stdin = std.Io.File.stdin().reader(io, &stdin_buf);
var stdout = std.Io.File.stdout().writer(io, &stdout_buf);
var parser = lambda.EnvParser.init("", allocator);
try stdout.interface.writeAll(prompt);
try stdout.interface.flush();
while (true) {
var line_buffer: [1024]u8 = undefined;
var w: std.Io.Writer = .fixed(&line_buffer);
const line_length = try stdin.interface.streamDelimiterLimit(&w, '\n', .unlimited);
stdin.interface.toss(1);
const input_line = line_buffer[0..line_length];
parser.loadnew(input_line);
var parsed = try parser.parse();
try stdout.interface.writeAll(try std.fmt.allocPrint(allocator, "{f} -> {f}\n{s}", .{ parsed, try parsed.normal(allocator), prompt }));
try stdout.interface.flush();
}
}
The Zig version took at least a new test for every new piece of code I wrote. After writing tests your code gets run and you get a chance of catching memory leaks and other common memory bugs. If you don't write tests it's very much possible that your code won't even compile when you begin using it, as unused code is entirely discarded in the compilation process. This also means your Zig code is as good as your tests are, which is true for a lot of languages. I don't think I would be able to write Zig without writing 1 line of tests for each line of code (or whatever it takes to reach 100% coverage of my code)
I thought I'd have a much harder time than expected, but once you accept Zig's hubris and design it's a very powerful and self-contained language! I have not even tested the integrated fuzzer yet. I feel like I could be more productive in Zig after this experiment rather than C, where you get the same problems but awkward tools to solve them.
The program I picked was intentionally adverse to Zig's imperative style and very easy in Haskell, and despite this I really enjoyed the experience.
This was very fun and a lot could be improved. Please contact me if you have strong opinions about anything I said.
In the next episodes we will: