Humanely Dealing with Humungus Crawlers

https://news.ycombinator.com/rss Hits: 7
Summary

I host a bunch of hobby code on my server. I would think it’s really only interesting to me, but it turns out every day, thousands of people from all over the world are digging through my code, reviewing years old changesets. On the one hand, wow, thanks, this is very flattering. On the other hand, what the heck is wrong with you?This has been building up for a while, and I’ve been intermittently developing and deploying countermeasures. It’s been a lot like solving a sliding block puzzle. Lots of small moves and changes, and eventually it starts coming together.My primary principle is that I’d rather not annoy real humans more than strictly intended. If there’s a challenge, it shouldn’t be too difficult, but ideally, we want to minimize the number of challenges presented. You should never suspect that I suspected you of being an enemy agent.First measure is we only challenge on the deep URLs. So, for instance, I can link to the anticrawl repo no problem, or even the source for anticrawl.go, and that’ll be served immediately. All the pages any casual browser would visit make up less than 1% of the possible URLs that exist, but probably contain 99% of the interesting content.Also, these pages get cached by the reverse proxy first, so anticrawl doesn’t even evaluate them. We’ve already done the work to render the page, and we’re trying to shed load, so why would I want to increase load by generating challenges and verifying responses? It annoys me when I click a seemingly popular blog post and immediately get challenged, when I’m 99.9% certain that somebody else clicked it two seconds before me. Why isn’t it in cache? We must have different objectives in what we’re trying to accomplish. Or who we’re trying to irritate.The next step is that anybody loading style.css gets marked friendly. Big Basilisk doesn’t care about my artisanal styles, but most everybody else loves them. So if you start at a normal page, and then start clicking deeper, that’s fine, still no challen...

First seen: 2025-09-12 18:02

Last seen: 2025-09-13 00:12