Fast Diversification for Search & Retrieval Pyversity is a fast, lightweight library for diversifying retrieval results. Retrieval systems often return highly similar items. Pyversity efficiently re-ranks these results to encourage diversity, surfacing items that remain relevant but less redundant. It implements several popular diversification strategies such as MMR, MSD, DPP, and Cover with a clear, unified API. More information about the supported strategies can be found in the supported strategies section. The only dependency is NumPy, making the package very lightweight. Quickstart Install pyversity with: pip install pyversity Diversify retrieval results: import numpy as np from pyversity import diversify , Strategy # Define embeddings and scores (e.g. cosine similarities of a query result) embeddings = np . random . randn ( 100 , 256 ) scores = np . random . rand ( 100 ) # Diversify the result diversified_result = diversify ( embeddings = embeddings , scores = scores , k = 10 , # Number of items to select strategy = Strategy . MMR , # Diversification strategy to use diversity = 0.5 # Diversity parameter (higher values prioritize diversity) ) # Get the indices of the diversified result diversified_indices = diversified_result . indices The returned DiversificationResult can be used to access the diversified indices , as well as the selection_scores of the selected strategy and other useful info. The strategies are extremely fast and scalable: this example runs in milliseconds. The diversity parameter tunes the trade-off between relevance and diversity: 0.0 focuses purely on relevance (no diversification), while 1.0 maximizes diversity, potentially at the cost of relevance. Supported Strategies The following table describes the supported strategies, how they work, their time complexity, and when to use them. The papers linked in the references section provide more in-depth information on the strengths/weaknesses of the supported strategies. Strategy What It Does ...
First seen: 2025-10-19 16:01
Last seen: 2025-10-20 03:03