Table of contents In my previous post, I introduced the simple programming language. Near the end of that post, I mentioned that the parser isn't very efficient. Let's try to optimize it! If you're just curious about the title of the article, you can skip ahead to that section. (Re-)introducing simp First, a quick recap on the programming language being implemented here: fn fibonacci(n) { if n < 2 { n } else { fib(n-1) + fib(n-2) } } let result = fibonacci(30); It's a very simple, language consisting of: Variables Functions Various kinds of expressions Conditionals It's not particularly useful, but it has enough stuff to be a good testing ground for experimenting with parsers. Simple AST Currently, the implementation uses a recursive descent parser which produces an abstract syntax tree (AST). The AST represents code in a more structured form, which is easier to work with in subsequent compilation passes. Here is a small subset of the full AST definition: enum Stmt { Expr(Box<Expr>), // ... other statements } enum Expr { Block(Box<ExprBlock>), // ... more expressions } struct ExprBlock { body: Vec<Stmt>, } It highlights a few things: Each kind of Stmt and Expr is in a Box, to allow these types to be recursive. Sequences are stored in Vecs. This kind of design is simple and flexible. Unfortunately, it also uses a lot of memory, and each syntax node requires many separate allocations. A note on benchmarking Benchmarking is difficult. It's very easy to introduce bias or measure the wrong thing. We'll be relying on two metrics: Throughput in lines of code per second Maximum memory usage The scenario will be to parse a set of files with different sizes, starting at a few kB and going up to 100 MB. We're effectively measuring how much time it takes to construct the AST, and how much memory it uses. We won't be measuring anything beyond that. A 100 MB file has nearly 10 million lines of code! The reason we go that high is to prevent the CPU from storing the entire file and...
First seen: 2025-12-10 17:34
Last seen: 2025-12-10 21:35