Transforming Your PDFs for RAG with Open Source Using Docling, Milvus, and Feast

https://news.ycombinator.com/rss Hits: 1
Summary

馃殌 Quickstart: RAG, Milvus, and Docling with Feast This project demonstrates how to use Feast to power a Retrieval-Augmented Generation (RAG) application. In particular, this example expands on the basic RAG demo to show: How to transform PDFs into text data with Docling that can be used by LLMs How to use Milvus as a vector database to store and retrieve embeddings for RAG How to transform PDFs with Docling during ingestion Online retrieval of features: Ensure real-time access to precomputed document embeddings and other structured data. Declarative feature definitions: Define feature views and entities in a Python file and empower Data Scientists to easily ship scalabe RAG applications with all of the existing benefits of Feast. Vector search: Leverage Feast鈥檚 integration with vector databases like Milvus to find relevant documents based on a similarity metric (e.g., cosine). Structured and unstructured context: Retrieve both embeddings and traditional features, injecting richer context into LLM prompts. Versioning and reusability: Collaborate across teams with discoverable, versioned feature transformations. data/: Contains the demo data, including Wikipedia summaries of cities with sentence embeddings stored in a Parquet file. Note, you ahave to use the docling-demo.ipynb to construct the docling_samples.parquet file, the metadata_samples.parquet file are provided for you. example_repo.py: Defines the feature views and entity configurations for Feast. feature_store.yaml: Configures the offline and online stores (using local files and Milvus Lite in this demo). The project has two main notebooks: docling-demo.ipynb: Demonstrates how to use Docling to extract text from PDFs and store the text in a Parquet file. docling-quickstart.ipynb: Shows how to use Feast to ingest the text data and store and retrieve it from the online store.

First seen: 2025-04-22 14:41

Last seen: 2025-04-22 14:41