Show HN: CocoIndex – Open-Source Data framework for AI, built for data freshness

https://news.ycombinator.com/rss Hits: 1

Summary

Extract, Transform, Index Data. Easy and Fresh. 🌴 CocoIndex is the world's first open-source engine that supports both custom transformation logic and incremental updates specialized for data indexing. Quick Start: If you're new to CocoIndex 🤗, we recommend checking out the 📖 Documentation and ⚡ Quick Start Guide. We also have a ▶️ quick start video tutorial for you to jump start. Setup Install CocoIndex Python library pip install -U cocoindex Setup Postgres with pgvector extension; or bring up a Postgres database using docker compose: Make sure Docker Compose is installed: docs Start a Postgres SQL database for cocoindex using our docker compose config: docker compose -f <( curl -L https://raw.githubusercontent.com/cocoindex-io/cocoindex/refs/heads/main/dev/postgres.yaml ) up -d Start your first indexing flow! Follow Quick Start Guide to define your first indexing flow. A common indexing flow looks like: @ cocoindex . flow_def ( name = "TextEmbedding" ) def text_embedding_flow ( flow_builder : cocoindex . FlowBuilder , data_scope : cocoindex . DataScope ): # Add a data source to read files from a directory data_scope [ "documents" ] = flow_builder . add_source ( cocoindex . sources . LocalFile ( path = "markdown_files" )) # Add a collector for data to be exported to the vector index doc_embeddings = data_scope . add_collector () # Transform data of each document with data_scope [ "documents" ]. row () as doc : # Split the document into chunks, put into `chunks` field doc [ "chunks" ] = doc [ "content" ]. transform ( cocoindex . functions . SplitRecursively (), language = "markdown" , chunk_size = 2000 , chunk_overlap = 500 ) # Transform data of each chunk with doc [ "chunks" ]. row () as chunk : # Embed the chunk, put into `embedding` field chunk [ "embedding" ] = chunk [ "text" ]. transform ( cocoindex . functions . SentenceTransformerEmbed ( model = "sentence-transformers/all-MiniLM-L6-v2" )) # Collect the chunk into the collector. doc_embeddings . collect ( file...

First seen: 2025-04-24 01:48

Last seen: 2025-04-24 01:48

Read Full Article More from this Source

Show HN: CocoIndex – Open-Source Data framework for AI, built for data freshness

Summary

Related News

Business co-founders in tech startups are less valuable than they think

Restoring a Sinclair C5

New material gives copper superalloy-like strength

The Creativity Hack No One Told You About: Read the Obits

In Memoriam: SF and Fine Artist David Schleinkofer