RepoRoulette 馃幉: Randomly Sample Repositories from GitHub Spin the wheel and see which GitHub repositories you get! 馃殌 Installation # Using pip pip install reporoulette # From source git clone https://github.com/gojiplus/reporoulette.git cd reporoulette pip install -e . 馃摉 Sampling Methods RepoRoulette provides three distinct methods for random GitHub repository sampling: 1. 馃幆 ID-Based Sampling Uses GitHub's sequential repository ID system to generate truly random samples by probing random IDs from the valid ID range. The downside of using the method is that the hit rate can be low (as many IDs are invalid, partly because the repo. is private or abandoned, etc.) And any filtering on repo. characteristics must wait till you have the names. The function will continue to sample till either max_attempts or till n_samples . You can pass the seed for reproducibility. from reporoulette import IDSampler # Initialize the sampler sampler = IDSampler ( token = "your_github_token" ) # Get 50 random repositories repos = sampler . sample ( n_samples = 50 ) # Print basic stats print ( f"Success rate: { sampler . success_rate :.2f } %" ) print ( f"Samples collected: { len ( repos ) } " ) 2. 鈴憋笍 Temporal Sampling Randomly selects time points (date/hour combinations) within a specified range and then retrieves repositories updated during those periods. from reporoulette import TemporalSampler from datetime import datetime , timedelta # Define a date range (last 3 months) end_date = datetime . now () start_date = end_date - timedelta ( days = 90 ) # Initialize the sampler sampler = TemporalSampler ( token = "your_github_token" , start_date = start_date , end_date = end_date ) # Get 100 random repositories repos = sampler . sample ( n_samples = 100 ) # Get repositories with specific characteristics filtered_repos = sampler . sample ( n_samples = 50 , min_stars = 10 , languages = [ "python" , "javascript" ] ) 3. 馃攳 BigQuery Sampling The BigQuerySampler leverages Google BigQuery's public GitH...
First seen: 2025-05-20 15:11
Last seen: 2025-05-20 16:11