Why Build Your Own? Look, I know what you're thinking. "Why not just use Elasticsearch?" or "What about Algolia?" Those are valid options, but they come with complexity. You need to learn their APIs, manage their infrastructure, and deal with their quirks. Sometimes you just want something that: Works with your existing database Doesn't require external services Is easy to understand and debug Actually finds relevant results That's what I built. A search engine that uses your existing database, respects your current architecture, and gives you full control over how it works. The Core Idea The concept is simple: tokenize everything, store it, then match tokens when searching. Here's how it works: Indexing: When you add or update content, we split it into tokens (words, prefixes, n-grams) and store them with weights Searching: When someone searches, we tokenize their query the same way, find matching tokens, and score the results Scoring: We use the stored weights to calculate relevance scores The magic is in the tokenization and weighting. Let me show you what I mean. Building Block 1: The Database Schema We need two simple tables: index_tokens and index_entries. index_tokens This table stores all unique tokens with their tokenizer weights. Each token name can have multiple records with different weights鈥攐ne per tokenizer. // index_tokens table structure id | name | weight ---|---------|------- 1 | parser | 20 // From WordTokenizer 2 | parser | 5 // From PrefixTokenizer 3 | parser | 1 // From NGramsTokenizer 4 | parser | 10 // From SingularTokenizer Why store separate tokens per weight? Different tokenizers produce the same token with different weights. For example, "parser" from WordTokenizer has weight 20, but "parser" from PrefixTokenizer has weight 5. We need separate records to properly score matches. The unique constraint is on (name, weight), so the same token name can exist multiple times with different weights. index_entries This table links tokens to docume...
First seen: 2025-11-17 05:57
Last seen: 2025-11-17 17:47