At Qodo, we’ve created a new benchmark dataset of real-world questions derived from large, complex code repositories. We are excited to release the dataset, methodology, and prompts used in its creation to support further research and development. Motivation Enterprises often maintain massive codebases that are difficult for any individual developer to navigate and fully understand. Whether onboarding, doing routine development, or using AI-assisted workflows, teams often have questions about their codebase. To effectively address this, we’ve developed specialized retrieval capabilities within our research agents. However, to benchmark and validate these systems effectively, we require a robust set of real-world questions and answers. Prior Work Existing benchmarks, such as CodeQA, primarily contain artificially generated code with questions limited to provided code snippets, requiring no retrieval from broader contexts. Another recent work (arXiv:2407.02883) involves real-world scenarios but focuses on retrieval from databases rather than code repositories, which does not adequately represent common real-world use-cases. To address this gap, we propose a new approach. We introduce a benchmark based on realistic questions derived from pull requests that require retrieval across multiple files in a codebase. Dataset Generation To effectively challenge retrieval systems, questions in our benchmark must: Require deep retrieval, often spanning multiple interconnected files. Reflect realistic questions developers encounter when solving actual issues. We identified that pull requests (PRs) are good sources for complex code changes with proper context that can be used for question and answer generation. PRs naturally link related code, not always through explicit imports or function calls, but through functional changes made together. We leveraged this insight to generate context: For each code change within a PR, we retrieved its containing method, class or file from the ...
First seen: 2025-09-11 11:15
Last seen: 2025-09-11 18:16