Faster substring search with SIMD in ZigPublished 10.08.2025I’ve been learning a lot about low-level programming languages lately, and for a long time there has been one thing that has interested me: SIMD (or ‘single instruction, multiple data’) code. I’ve seen a lot of articles about having massive performance gains by utilizing SIMD and wanted to learn how to do it myself.This article is a journey into implementing ~60% faster substring searching compared to Zig’s std.mem.indexOf using a SIMD-friendly algorithm.Baseline #This is the baseline function that we will be comparing against:fn find_substr(needle: []const u8, haystack: []const u8) ?usize { return std.mem.indexOf(u8, haystack, needle); } It’s the closest thing to a substring search function from Zig’s standard library. It returns the first index of a subsequence – or null if not found.SIMD algorithm #This algorithm is taken directly from Wojciech Muła’s fantastic article SIMD-friendly algorithms for substring searching, which seems to have the best algorithms for finding substrings in a large body of text.Here’s how the algorithm works: say that we want to find the index of the word “blue” (the needle) in “It was a beautiful, bounteous, blue day” (the haystack). First, we extract the first and last character of the needle (‘b’ and ’e’) and store them in a variable.Then we will loop through all of the characters in haystack, loading the next 32 characters (bytes) from memory into a SIMD register and comparing each character (byte) in the register with ‘b’. This will result in a mask containing 32 bytes, 1 if the character is ‘b’ and 0 in all other cases.We will do the same with the last character, but load the characters with an offset (needle.len - 1).Without the offset, any match that starts in one 32‑byte chunk and ends in the next would be missed. With this method, we can also check for needles that are longer than 32 characters.The result will be two bit masks, First and Last, where we can use bit-wise...
First seen: 2025-08-11 10:49
Last seen: 2025-08-12 01:51