Binmoji: A 64-bit emoji encoding

https://news.ycombinator.com/rss Hits: 4
Summary

binmoji: a compact, lossless, 64-bit emoji encoding Specification binmoji is a C library and command-line tool that encodes any standard Unicode emoji into a single, fixed-size 64-bit integer ( uint64_t ). This provides a highly efficient, compact, and indexable alternative to storing emojis as variable-length UTF-8 strings. Key Features ✨ Compact Storage : Represents any emoji, no matter how complex, as a single uint64_t . : Represents any emoji, no matter how complex, as a single . High Performance : Blazing-fast encoding and decoding with minimal overhead. : Blazing-fast encoding and decoding with minimal overhead. Lossless Conversion : Guarantees perfect round-trip conversion from emoji to ID and back. : Guarantees perfect round-trip conversion from emoji to ID and back. Unicode Compliant : The lookup table is generated from the official Unicode emoji data files, ensuring compatibility. : The lookup table is generated from the official Unicode emoji data files, ensuring compatibility. Self-Contained : Includes a test suite to verify correctness against the Unicode standard. : Includes a test suite to verify correctness against the Unicode standard. Low hash table bloat : Skin tone variations are flags, leading to a small lookup table (~158 entries) : Skin tone variations are flags, leading to a small lookup table (~158 entries) C89: Works everywhere How It Works ⚙️ An emoji sequence is deconstructed into its fundamental parts, which are then packed into a 64-bit integer. Bits (63-0) Field Name Size (bits) Description 63-42 Primary Codepoint 22 The first base emoji in the sequence (e.g., '👩'). 41-10 Component Hash 32 A CRC-32 hash of all subsequent emojis (e.g., '‍👩‍👧‍👦'). 9-7 Skin Tone 1 3 The first skin tone modifier. 6-4 Skin Tone 2 3 The second skin tone modifier (for couple/family emojis). 3-0 Flags 4 Reserved for future use. Because hashing is a one-way process, a pre-computed lookup table ( binmoji_table.h ) is used to map a Component Hash back to its orig...

First seen: 2025-10-24 05:34

Last seen: 2025-10-24 08:35