Bots are overwhelming websites with their hunger for AI data

https://news.ycombinator.com/rss Hits: 1
Summary

Bots harvesting content for AI companies have proliferated to the point that they're threatening digital collections of arts and culture. Galleries, Libraries, Archives, and Museums (GLAMs) say they're being overwhelmed by AI bots – web crawling scripts that visit websites and download data to be used for training AI models – according to a report issued on Tuesday by the GLAM-E Lab, which studies issues affecting GLAMs. GLAM-E Lab is a joint initiative between the Centre for Science, Culture and the Law at the University of Exeter and the Engelberg Center on Innovation Law & Policy at NYU Law. Based on an anonymized survey of 43 organizations, the report indicates that cultural institutions are alarmed by the aggressive harvesting of their content, which shows no regard for the burden that data-harvesting places on websites. "Bots are widespread, although not universal," the report says. "Of 43 respondents, 39 had experienced a recent increase in traffic. Twenty-seven of the 39 respondents experiencing an increase in traffic attributed it to AI training data bots, with an additional seven believing that bots could be contributing to the traffic." The surge in bots that gather data for AI training, the report says, often went unnoticed until it became so bad that it knocked online collections offline. "Respondents worry that swarms of AI training data bots will create an environment of unsustainably escalating costs for providing online access to collections," the report says. The institutions commenting on these concerns have differing views about when the bot surge began. Some report noticing it as far back in 2021 while others only began noticing web scraper traffic this year. Some of the bots identify themselves, but some don't. Either way, the respondents say that robots.txt directives – voluntary behavior guidelines that web publishers post for web crawlers – are not currently effective at controlling bot swarms. Bot defenses offered by the likes of AWS and Cl...

First seen: 2025-06-17 22:16

Last seen: 2025-06-17 22:16