Show HN: Model2vec-Rs – Fast Static Text Embeddings in Rust

https://news.ycombinator.com/rss Hits: 8

Summary

This crate provides a lightweight Rust implementation for loading and inference of Model2Vec static embedding models. For distillation and training, the Python Model2Vec package can be used. Quick Start Add the crate: cargo add model2vec-rs Make embeddings: use anyhow :: Result ; use model2vec_rs :: model :: StaticModel ; fn main ( ) -> Result < ( ) > { // Load a model from the Hugging Face Hub or a local path // args = (repo_or_path, token, normalize, subfolder) let model = StaticModel :: from_pretrained ( "minishlab/potion-base-8M" , None , None , None ) ? ; // Prepare a list of sentences let sentences = vec ! [ "Hello world" . to_string ( ) , "Rust is awesome" . to_string ( ) , ] ; // Create embeddings let embeddings = model . encode ( & sentences ) ; println ! ( "Embeddings: {:?}" , embeddings ) ; Ok ( ( ) ) } Make embeddings with the CLI: # Single sentence cargo run -- encode "Hello world" minishlab/potion-base- 8 M # Multiple lines from a file echo -e "Hello world Rust is awesome" > input . txt cargo run -- encode input . txt minishlab /potion-base- 8 M --output embeds . json Make embeddings with custom encode args: let embeddings = model . encode_with_args ( & sentences , // input texts Some ( 512 ) , // max length 1024 , // batch size ) ; Models We provide a number of models that can be used out of the box. These models are available on the HuggingFace hub and can be loaded using the from_pretrained method. The models are listed below. Model Language Sentence Transformer Params Task potion-base-32M English bge-base-en-v1.5 32.3M General potion-base-8M English bge-base-en-v1.5 7.5M General potion-base-4M English bge-base-en-v1.5 3.7M General potion-base-2M English bge-base-en-v1.5 1.8M General potion-retrieval-32M English bge-base-en-v1.5 32.3M Retrieval M2V_multilingual_output Multilingual LaBSE 471M General Performance We compared the performance of the Rust implementation with the Python version of Model2Vec. The benchmark was run single-threaded on a CPU....

First seen: 2025-05-18 17:51

Last seen: 2025-05-19 02:53

Read Full Article More from this Source

Show HN: Model2vec-Rs – Fast Static Text Embeddings in Rust

Summary

Related News

Using computers more freely and safely (2023)

A Dark Adtech Empire Fed by Fake CAPTCHAs

Kyber (YC W23) Is Hiring a Technical Account Manager

Show HN: Tattoy – a text-based terminal compositor

OxCaml - a set of extensions to the OCaml programming language.