Show HN: I modeled the Voynich Manuscript with SBERT to test for structure

https://news.ycombinator.com/rss Hits: 6
Summary

📜 Voynich Manuscript Structural Analysis 🔍 Overview This started as a personal challenge to figure out what modern NLP could tell us about the Voynich Manuscript — without falling into translation speculation or pattern hallucination. I'm not a linguist or cryptographer. I just wanted to see if something as strange as Voynichese would hold up under real language modeling: clustering, POS inference, Markov transitions, and section-specific patterns. Spoiler: it kinda did. This repo walks through everything — from suffix stripping to SBERT embeddings to building a lexicon hypothesis. No magic, no GPT guessing. Just a skeptical test of whether the manuscript has structure that behaves like language, even if we don’t know what it’s saying. 🧠 Why This Matters The Voynich Manuscript remains undeciphered, with no agreed linguistic or cryptographic solution. Traditional analyses often fall into two camps: statistical entropy checks or wild guesswork. This project offers a middle path — using computational linguistics to assess whether the manuscript encodes real, structured language-like behavior. 📁 Project Structure /data/ AB.docx # Full transliteration with folio/line tags voynichese/ # Root word .txt files stripped_cluster_lookup.json # Cluster ID per stripped root unique_stripped_words.json # All stripped root forms voynich_line_clusters.csv # Cluster sequences per line /scripts/ cluster_roots.py # SBERT clustering + suffix stripping map_lines_to_clusters.py # Maps manuscript lines to cluster IDs pos_model.py # Infers grammatical roles from cluster behavior transition_matrix.py # Builds and visualizes cluster transitions lexicon_builder.py # Creates a candidate lexicon by section and role cluster_language_similarity.py # (Optional) Compares clusters to real-world languages /results/ Figure_1.png # SBERT clusters (PCA reduced) transition_matrix_heatmap.png # Markov transition matrix cluster_role_summary.csv cluster_transition_matrix.csv lexicon_candidates.csv ✅ Key Contr...

First seen: 2025-05-18 16:51

Last seen: 2025-05-18 21:52