You should feed the bots: Aug 3, 2025 (Programming) A week ago, I set up an infinite nonsense crawler trap – now it makes up 99% of my server’s traffic. What surprised me is that feeding scrapers garbage is the cheapest and easiest thing I could do. Meet the bots: These aren’t the indexing bots of old, but scrapers collecting data to train LLMs. Unlike search engines, which need the websites they crawl to stay up, AI companies provide a replacement. It should come as no surprise that these bots are aggressive and relentless: They ignore robots.txt, and if block them by user agent they just pretend to be a browser. If you ban their IP, they switch addresses. … all while sending multiple requests per second, all day, every day. Giving up: So what if we let them access the site? Serving static files is is relatively cheap, but not free. SSD access times are in the tens milliseconds, and that’s before you pay the filesystem tax. Bots also like to grab old and obscure pages, ones that are unlikely to be in cache. As a result, it doesn’t take all that many requests to bog down the server. Then there’s the matter of bandwidth: Many blog posts also include images weighing hundreds to thousands of kB, which can add up quite quickly. With an average file size of 100 kB, 4 requests per second adds up to a terabyte each month – not a huge amount of data, but more then I’m willing to throw away. The ban hammer: Simply making a list of IPs and blocking them would for normal bots… … but these are hardly normal bots. Because they are backed by billion dollar companies, they don’t just have a few addresses, but many thousands. If you managed to ban all of their addresses, they’ll just buy more. Rate limits fail for the same reason: They just switch IPs. I’ve even seen them using new IP for each request. Building a wall: Ok, what about a pay-wall, login-wall, CAPTCHA-wall, or a hash based proof-of-work? All of these inconvenience users. Requiring an account guaranties that no one wil...
First seen: 2025-10-26 13:03
Last seen: 2025-10-26 15:05