Regex Isn't Hard (2023)

https://news.ycombinator.com/rss Hits: 2
Summary

Regex Isn't Hard Tue July 11, 2023 Regex gets a bad reputation for being very complex. That’s fair, but I also think that if you focus on a certain core subset of regex, it’s not that hard. Most of the complexity comes from various “shortcuts” that are hard to remember. If you ignore those, the language itself is fairly small and portable across programming languages. It’s worth knowing regex because you can get A LOT done in very little code. If I try to replicate what my regex does using normal procedural code, it’s often very verbose, buggy and significantly slower. It often takes hours or days to do better than a couple minutes of writing regex. NOTE: Some languages, like Rust, have parser combinators which can be as good or better than regex in most of the ways I care about. However, I often opt for regex anyway because it’s less to fit in my brain. There’s a single core subset of regex that all major programming languages support. There’s four major concepts you need to know Character sets Repetition Groups The |, ^ and $ operators Here I’ll highlight a subset of the regex language that’s not hard to understand or remember. Throughout I’ll also tell you what to ignore. Most of these things are shortcuts that save a little verbosity at the expense of a lot of complexity. I’d rather verbosity than complexity, so I stick to this subset. Character Sets A character set is the smallest unit of text matching available in regex. It’s just one character. Single characters a matches a single character, always lowercase a. aaa is 3 consecutive character sets, each matches only a. Same with abc, but the second and third match b and c respectively. Ranges Match one of a set of characters. [a] — same as just a [abc] — Matches a, b, or c. [a-c] — Same, but using - to specify a range of characters [a-z] — any lowercase character [a-zA-Z] — any lowercase or uppercase character [a-zA-Z0-9!@#$%^&*()-] — alphanumeric plus any of these symbols: !@#$%^&*()- Note in that last point ...

First seen: 2025-04-21 11:35

Last seen: 2025-04-21 12:35