Show HN: Spam classifier in Go using Naive Bayes

https://news.ycombinator.com/rss Hits: 5
Summary

nspammer A Naive Bayes spam classifier implementation in Go, enabling text classification system using the Naive Bayes algorithm with Laplace smoothing to classify messages as spam or not spam. Features Naive Bayes Classification : Uses probabilistic classification based on Bayes' theorem with naive independence assumptions : Uses probabilistic classification based on Bayes' theorem with naive independence assumptions Laplace Smoothing : Implements additive smoothing to handle zero probabilities for unseen words : Implements additive smoothing to handle zero probabilities for unseen words Training & Classification : Simple API for training on labeled datasets and classifying new messages : Simple API for training on labeled datasets and classifying new messages Real Dataset Testing: Includes tests with actual spam/ham email datasets Installation go get github.com/igomez10/nspammer Usage Basic Example package main import ( "fmt" "github.com/igomez10/nspammer" ) func main () { // Create training dataset (map[string]bool where true = spam, false = not spam) trainingData := map [ string ] bool { "buy viagra now" : true , "get rich quick" : true , "meeting at 3pm" : false , "project update report" : false , } // Create and train classifier classifier := nspammer . NewSpamClassifier ( trainingData ) // Classify new messages isSpam := classifier . Classify ( "buy now" ) fmt . Printf ( "Is spam: %v " , isSpam ) } API Creates a new spam classifier and trains it on the provided dataset. The dataset is a map where keys are text messages and values indicate whether the message is spam ( true ) or not spam ( false ). (*SpamClassifier).Classify(input string) bool Classifies the input text as spam ( true ) or not spam ( false ) based on the trained model. How It Works The classifier uses the Naive Bayes algorithm: Training Phase: Calculates prior probabilities: P(spam) and P(not spam) Builds a vocabulary from all training messages Counts word occurrences in spam and non-spam messa...

First seen: 2025-11-16 23:56

Last seen: 2025-11-17 03:56