The best Hacker on HackerOne is now an AI

https://news.ycombinator.com/rss Hits: 8
Summary

For the first time in bug bounty history, an autonomous penetration tester has reached the top spot on the US leaderboard. Our path to reaching the top ranks on HackerOne began with rigorous benchmarking. Since the early days of XBOW, we understood how crucial it was to measure our progress, and we did that in two stages: First we tested XBOW with existing CTF challenges (from well-known providers like PortSwigger and Pentesterlab), then quickly moved on and built our own unique benchmark that simulates real-world scenarios—ones never used to train LLMs before. The results were encouraging, but still these were artificial exercises. The logical next step, therefore, was to focus on discovering zero-day vulnerabilities in open source projects, which led to many exciting findings. Some of these were reported on this blog before: in every case, we gave the AI access to source code, simulating a white-box pentest. While our paying customers were enthusiastic about XBOW’s capabilities, the community raised a key question: How would XBOW perform in real, black-box production environments? We took up that challenge, choosing to compete in one of the largest hacker arenas, where companies serve as the ultimate judges by verifying and triaging vulnerabilities themselves. Dogfooding AI in Bug Bounties XBOW is a fully autonomous AI-driven penetration tester. It requires no human input, operates much like a human pentester, but can scale rapidly, completing comprehensive penetration tests in just a few hours. When building AI software, having precise benchmarks to keep pushing the limit of what’s possible, is essential. But when some of those benchmarks evolve into real-world environments, it’s a developer’s dream come true. Discovering bugs in structured benchmarks and open source projects was a fantastic starting point. However, nothing can truly prepare you for the immense diversity of real-world environments, which span from cutting-edge technologies to 30-year-old legacy s...

First seen: 2025-06-24 18:12

Last seen: 2025-06-25 02:16