Using Pegs in Janet

https://news.ycombinator.com/rss Hits: 9
Summary

How‑To: Using PEGs in Janet Janet is a small, Lisp-like language. Unlike most programming languages, it offers no support for regular expressions. Instead, Janet supports parser expression grammars, or PEGs. A PEG in Janet is usually described by an associative data structure that lists a series of rules. For each rule, the key is the name of the rule and the value is a description of the string that the rule will match. What makes PEGs especially powerful is the ability for rules to refer to other rules (including recursive references) and for rules to run arbitrary functions. Let’s see how we can use a PEG to parse a simplified subset of HTML. We’ll use sequences, choices, captures (both compiled and match-time), replacements, drops and back-references. It’s going to be fun. Steps Step 1. Define :main rule Janet begins parsing using the :main rule. So let’s start with that: '{:main (* :tagged -1)} This rule defines a pattern consisting of a sequence (represented by *) of the rule :tagged and the value -1. This rule will match if the rule :tagged matches and then the string ends (the value -1 matches if we are at the end of the string). Step 2. Define :tagged rule Now if we try to use this grammar, Janet will complain that the rule :tagged is not defined so let’s define that next: '{:main (* :tagged -1) :tagged (* :open-tag :value :close-tag)} This is pretty straightforward. Our :tagged rule consists of an opening tag, a value of some kind and a closing tag. Step 3. Define :open-tag rule '{:main (* :tagged -1) :tagged (* :open-tag :value :close-tag) :open-tag (* "<" (capture :w+ :tag-name) ">")} We name the capture so that we can use a reference to it in our closing tag rule. I went with :tag-name but you can choose whatever you like. Step 4. Define :close-tag rule Up to this point, we’ve been adding rules in the order that they’re processed. Let’s deviate from that here and instead define our next rule to match closing tags: ~{:main (* :tagged -1) :tagged (* :o...

First seen: 2025-10-19 02:59

Last seen: 2025-10-19 12:01