The best – but not good – way to limit string length

https://news.ycombinator.com/rss Hits: 6
Summary

Getting the length of a string seems simple and is something we do in our code every day. Limiting the length of a string is also extremely common in both frontend and backend code. But both of those actions – especially length limiting – hide a lot of complexity, bug-risk, and even vulnerability danger. In this post, we’re going to examine string length limiting deeply enough to help us fully grok what it means when we do it and how best to do it… and discover that the best still isn’t great.A TL;DR misses the “fully grok” part, but not everyone has time to read everything, so here are the key takeaways:Be aware that there are different ways of measuring string length.Really understand how your programming language stores strings in memory, exposes them to you, and determines string length.Make an intentional decision about how you’re going to count characters when limiting string length.Look carefully at how the “max length” features provided by your language (framework, etc.) actually work. There’s a very good chance that they do not match the limiting method you chose.Make sure you use that same counting method across all the layers of your architecture.Probably limit by counting normalized Unicode code points. (Like Google recommends.)With that out of the way, let’s start our investigation by looking at some of our familiar string length functions:“a”“字”“🔤”“👨‍👩‍👧‍👦”“र्स्प”“x̴͙̹̬̑̓͝͝”Golen(string)134251517JavaScriptString.length1121159Python 3len(str)111759SwiftString.count111111Those four measurements of string length are exemplars of the approaches common to most programming languages: UTF-8 bytes, UTF-16 code units, Unicode code points, and grapheme clusters.## Character encodings and terminologyThere are good explanations of this stuff elsewhere, but let’s try to quickly get a handle on the concepts we need to go further. (Feel free to skip anything you’re already comfortable with.)First, a working definition of a “character”: This is the human conceptu...

First seen: 2025-04-30 22:29

Last seen: 2025-05-01 03:30