As you may have noticed, SourceHut has deployed Anubis to parts of our services to protect ourselves from aggressive LLM crawlers. Much ink has been spilled on the subject of the LLM problem elsewhere, and we needn’t revisit that here. I do want to take this opportunity, however, to clarify how SourceHut views this kind of scraper behavior more generally, and how we feel that the data entrusted to us by our users ought to be used. Up until this point, we have offered some quiet assurances to this effect in a few places, notably our terms of service and robots.txt file. Quoting the former: You may use automated tools to obtain public information from the services for the purposes of archival or open-access research. You may not use this data for recruiting, solicitation, or profit. This has been part of our terms of service since they were originally written in 2018. With the benefit of hindsight, I might propose a different wording to better reflect our intentions – but we try not to update the terms of service too often because we have to send all users an email letting them know we’ve done so. I have a proposed edit pending to include in the next batch of changes to the terms which reads as follows: You may use automated tools to access public SourceHut data in bulk (i.e. crawlers, robots, spiders, etc) provided that: Your software obeys the rules set forth in robots.txt Your software uses a User-Agent header which clearly identifies your software and its operators, including your contact information Your software requests data at a rate which does not negatively affect the performance of our services for other users You may only collect this data for one or more of the following purposes: Search engine indexing Open-access research Archival You may not use automated tools to collect SourceHut data for solicitation, profit, training machine learning models, or any other purpose not enumerated here without explicit permission from SourceHut staff. This text, or som...
First seen: 2025-04-15 15:11
Last seen: 2025-04-15 16:12