Generate embeddings for search query鈥婼entence Transformers provide local, easy to use embedding models for capturing the semantic meaning of sentences and paragraphs.The dataset in this HackerNews dataset contains vector emebeddings generated from the all-MiniLM-L6-v2 model.An example Python script is provided below to demonstrate how to programmatically generate embedding vectors using sentence_transformers1 Python package. The search embedding vector is then passed as an argument to the [cosineDistance()](/sql-reference/functions/distance-functions#cosineDistance) function in the SELECT` query.from sentence_transformers import SentenceTransformer import sys import clickhouse_connect print("Initializing...") model = SentenceTransformer('sentence-transformers/all-MiniLM-L6-v2') chclient = clickhouse_connect.get_client() # ClickHouse credentials here while True: # Take the search query from user print("Enter a search query :") input_query = sys.stdin.readline(); texts = [input_query] # Run the model and obtain search vector print("Generating the embedding for ", input_query); embeddings = model.encode(texts) print("Querying ClickHouse...") params = {'v1':list(embeddings[0]), 'v2':20} result = chclient.query("SELECT id, title, text FROM hackernews ORDER BY cosineDistance(vector, %(v1)s) LIMIT %(v2)s", parameters=params) print("Results :") for row in result.result_rows: print(row[0], row[2][:100]) print("---------") An example of running the above Python script and similarity search results are shown below (only 100 characters from each of the top 20 posts are printed):Initializing... Enter a search query : Are OLAP cubes useful Generating the embedding for "Are OLAP cubes useful" Querying ClickHouse... Results : 27742647 smartmic: slt2021: OLAP Cube is not dead, as long as you use some form of:<p>1. GROUP BY multiple fi --------- 27744260 georgewfraser:A data mart is a logical organization of data to help humans understand the schema. Wh --------- 27761434 mwexler:&qu...
First seen: 2025-11-28 18:41
Last seen: 2025-11-29 14:44