How Janet's PEG module works

https://news.ycombinator.com/rss Hits: 10
Summary

In making Janet, I was inspired by the REBOL Parse module and LPeg to include a versatile library for parsing text and any sequence of bytes based on Parsing Expression Grammars, or PEGs. I was also willing to exclude other forms of parsing tools such as regular expressions from the core library, as PEGs are more general than regular expression, or easier to read, and can capture error locations when matching text. The development of PEGs in Janet went particularly smoothly, made possible by the simple PEG formalism, Janet's basic data structures, and a very straight-forward subroutine threaded interpreter. While much of Janet's PEG module could be implemented in any language, Janet's C API and rich set of core data structures made it easy to implement PEGs in Janet. This post is about how the Janet PEG engine is implemented; it is not a guide on using pegs. Although the code is written in Janet, It can be translated directly to most programming languages as we make no special use of any data structures besides arrays and hashtables. For information about using Janet's PEG module, see the Janet PEG documentation. A Quick PEG Overview The Parsing Expression Grammar formalism has it's roots in a 2004 Paper by Bryan Ford, who presents both the model and an algorithm for recognizing any PEG grammar in linear time. Our PEG engine will not parser all patterns in linear time, but will use similar semantics for defining our grammar. One difference between PEGs and more traditional parser generators, such as those used by YACC and Bison, is the ordered choice operator. For the purposes of this post, + is the ordered choice operator. For example, using a pseudo-syntax for PEG grammars where the MAIN rule is the entry point of the grammar, we can write a grammar that matches any text that starts with a, b, or c. MAIN: "a" + "b" + "c"This grammar will match "apple pie", "banana split", and "coconut cluster", but not "orange soda", "watermelon taffy", or "lemon snap". It's also ...

First seen: 2025-04-14 17:05

Last seen: 2025-04-15 02:07