SSE The generic, simplest implementation. size_t sse4_strstr_anysize(const char* s, size_t n, const char* needle, size_t needle_size) { const __m128i prefix = _mm_loadu_si128(reinterpret_cast<const __m128i*>(needle)); const __m128i zeros = _mm_setzero_si128(); for (size_t i = 0; i < n; i += 8) { const __m128i data = _mm_loadu_si128(reinterpret_cast<const __m128i*>(s + i)); const __m128i result = _mm_mpsadbw_epu8(data, prefix, 0); const __m128i cmp = _mm_cmpeq_epi16(result, zeros); unsigned mask = _mm_movemask_epi8(cmp) & 0x5555; while (mask != 0) { const auto bitpos = bits::get_first_bit_set(mask)/2; if (memcmp(s + i + bitpos + 4, needle + 4, needle_size - 4) == 0) { return i + bitpos; } mask = bits::clear_leftmost_set(mask); } } return std::string::npos; } AVX512F Although AVX512F doesn't support MPSADBW (AVX512BW defines it) we still can use 4-byte prefix equality as a predicate, utilizing fact that 32-bit elements are natively supported. In each iteration we generate four AVX512 vectors containing all possible 4-byte prefixes. Example: string = "the-cat-tries-to-eat..." vec0 = [ t | h | e | - ][ c | a | t | - ][ t | r | i | e ][ s | - | t | o ][ ... ] vec1 = [ h | e | - | c ][ a | t | - | t ][ r | i | e | s ][ - | t | o | - ][ ... ] vec2 = [ e | - | c | a ][ t | - | t | r ][ i | e | s | - ][ t | o | - | e ][ ... ] vec3 = [ - | c | a | t ][ - | t | r | i ][ e | s | - | t ][ o | - ] e | a ][ ... ] Vector vec0 contains prefixes for position 0, 4, 8, 12, ...; vec1 — 1, 5, 9, 13, ..., vec2 — 2, 6, 10, 14, ...; vec3 — 3, 7, 11, 15, etc. Building each vector require two shifts and one bitwise or. In each iteration four vector comparison are performed and then four bitmasks are examined. This make a loop, which compares substrings, quite complicated. Moreover, to properly fill the last elements of vectors we need four bytes beyond vector. This is accomplished by having two adjacent vectors per iterations (one load per iteration is needed, though). Finally, instruction VP...
First seen: 2025-06-14 04:58
Last seen: 2025-06-15 01:00