Writing a C compiler in 500 lines of Python Posted August 30, 2023 A few months ago, I set myself the challenge of writing a C compiler in 500 lines of Python, after writing my SDF donut post. How hard could it be? The answer was, pretty hard, even when dropping quite a few features. But it was also pretty interesting, and the result is surprisingly functional and not too hard to understand! There's too much code for me to comprehensively cover in a single blog post, so I'll just give an overview of the decisions I made, things I had to cut, and the general architecture of the compiler, touching on a representative piece of each part. Hopefully after reading this post, the code is more approachable! Decisions, decisions The first, and most critical decision, was that this would be a single-pass compiler. 500 lines is too spare to be defining and transforming an abstract syntax tree! What does that mean? Most compilers: faffing around with syntax trees Well, most compiler's internals look something like this: The tokens get lexed, then a parser runs over them and builds pretty little syntax trees: # hypothetical code, not from anywhere def parse_statement(lexer) -> PrettyLittleSyntaxTree: ... if type := lexer.try_next(TYPE_NAME): variable_name = lexer.next(IDENTIFIER) if lexer.try_next("="): initializer = parse_initializer(lexer) else: initializer = None lexer.next(SEMICOLON) return VariableDeclarationNode( type = type, name = variable_name, initializer = initializer, ) ... # much later... def emit_code_for(node: PrettyLittleSyntaxTree) -> DisgustingMachineCode: ... if isinstance(node, VariableDeclarationNode): slot = reserve_stack_space(node.type.sizeof()) add_to_environment(node.name, slot) if node.initializer is not None: register = emit_code_for(node.initializer) emit(f"mov {register}, [{slot}]") ... The important thing here is that there's two passes, first the parsing builds up a syntax tree, then a second pass chews that tree up and turns it into machine code....
First seen: 2025-09-03 17:56
Last seen: 2025-09-04 16:01