Slightly better named character reference tokenization than Chrome, Safari, FF

https://news.ycombinator.com/rss Hits: 9
Summary

Slightly better named character reference tokenization than Chrome, Safari, and Firefox 2025-06-26 - Programming Note: I am not a 'browser engine' person, nor a 'data structures' person. I'm certain that an even better implementation than what I came up with is very possible. A while back, for no real reason, I tried writing an implementation of a data structure tailored to the specific use case of the Named character reference state of HTML tokenization (here's the link to that experiment). Recently, I took that implementation, ported it to C++, and used it to make some efficiency gains and fix some spec compliance issues in the Ladybird browser. Throughout this, I never actually looked at the implementations used in any of the major browser engines (no reason for this, just me being dumb). However, now that I have looked at Blink/WebKit/Gecko (Chrome/Safari/Firefox, respectively), I've realized that my implementation seems to be either on-par or better across the metrics that the browser engines care about: Efficiency (at least as fast, if not slightly faster) Compactness of the data (uses ~60% of the data size of Chrome's/Firefox's implementation) Ease of use Note: I'm singling out these metrics because, in the python script that generates the data structures used for named character reference tokenization in Blink (the browser engine of Chrome/Chromium), it contains this docstring (emphasis mine): """This python script creates the raw data that is our entity database. The representation is one string database containing all strings we could need, and then a mapping from offset+length -> entity data. That is compact, easy to use and efficient.""" So, I thought I'd take you through what I came up with and how it compares to the implementations in the major browser engines. Mostly, though, I just think the data structure I used is neat and want to tell you about it (fair warning: it's not novel). What is a named character reference?馃敆 A named character reference is ...

First seen: 2025-06-27 20:28

Last seen: 2025-06-28 04:29