Qodo CLI agent scores 71.2% on SWE-bench Verified

https://news.ycombinator.com/rss Hits: 8

Summary

We’re excited to announce that Qodo Command, our CLI agent, achieved a scored of 71.2% on SWE-bench Verified (submission pending review), the leading benchmark for evaluating AI agents on real-world software engineering tasks. This achievement is a strong signal that Qodo’s agents are built for the realities of production development. For use cases like reviewing code, writing tests, fixing bugs, and generating features, our CLI agent goes beyond autocomplete to deliver thoughtful, context-aware, and high-integrity code. One-Shot, Real-World Execution Most AI benchmarks evaluate agents in isolated, simplified environments. However, SWE-bench Verified tests coding agents in messy, complex, real-world software engineering scenarios. Each test case in SWE-bench is built from a real GitHub issue in one of 12 widely-used, open-source Python repositories. Agents are given the GitHub issue and the codebase in the state it was in when the issue was opened, and must reason, plan, and edit code, iterating over many turns as a developer would – without shortcutting the problem. Qodo Command scored 71.2% using a single run of the production version of Qodo Command—no finetuning or benchmark-specific adjustments—exactly the way any developer would by running it out-of-the-box with the simple install package: npm install -g @qodo/command. LLM Model Flexibility & Claude Partnership While Qodo Command is designed to support all top-tier LLMs, Claude 4 emerged as our model of choice for SWE Bench Verified results. Thanks to a strong partnership with Anthropic—Qodo is a “Powered by Claude” solution, we’re collaboratively building the world’s most adaptive and learning-oriented coding agents, leveraging one of the most advanced language models available today. The Architecture Behind Our 71.2% SWE-bench success Achieving high performance on SWE-bench wasn’t about optimizing for the benchmark–it was the natural result of engineering Qodo Command to excel at real-world software engineer...

First seen: 2025-08-12 11:53

Last seen: 2025-08-12 18:54

Read Full Article More from this Source

Qodo CLI agent scores 71.2% on SWE-bench Verified

Summary

Related News

Microsoft PowerToys

Show HN: TailGuard – Bridge your WireGuard router into Tailscale via a container

The Scam Called "You Don't Have to Remember Anything"

E-Paper Display Refresh Rate Reaches New Heights

PKM apps need to get better at resurfacing information