You Wouldn't Download a Hacker News

https://news.ycombinator.com/rss Hits: 13
Summary

TLDR: I Did Download It And now I can analyze it with DuckDB. Behold the fraction of total comments and stories referencing key topics over time! 00.010.020.030.040.050.060.070.085/14/20075/14/20085/14/20095/14/20105/14/20115/14/20125/14/20135/14/20145/14/20155/14/20165/14/20175/14/20185/14/20195/14/20205/14/20215/14/20225/14/20235/14/2024The Rise Of Rustavg_python_12wavg_javascript_12wavg_java_12wavg_ruby_12wavg_rust_12w 00.0010.0020.0030.0040.0050.0060.0070.0080.0090.015/14/20075/14/20085/14/20095/14/20105/14/20115/14/20125/14/20135/14/20145/14/20155/14/20165/14/20175/14/20185/14/20195/14/20205/14/20215/14/20225/14/20235/14/2024The Progression of Postgresavg_mysql_12wavg_postgres_12wavg_mongo_12wavg_redis_12wavg_sqlite_12w Part 1: The Mods Are Asleep, Download It All As part of building hn.unlurker.com, I wrote a HN API client. There are already a bunch of other clients, but I wanted to try the latest Go features and linters on a new project. I’m glad I did; it was a lot of fun. The client can retrieve active items, lists of items, etc. (comments and stories are called “items” in the HN API). Although I only really needed recent items for my project, for completeness I added “scan” which downloads all the items, in order, from zero to the latest or the other way around. I wondered — could I just download the whole thing? Extrapolating from a few thousand items, it would only be tens of GiB of JSON. I thought I’d give it a try. hn scan --no-cache --asc -c- -o full.json I had to CTRL-C a stalled download a few times, but scan is resumable so after a few hours I was done. I had a 20 GiB JSON file of everything that has ever happened on Hacker News, and I can just re-run the command above to “top it off” any time I need the latest. But what could I do with it? Part 2: Feed The Ducks First I just grepped for things. How many times has the phrase “correct horse battery staple” appeared on the site? Quite a few: 231 times (the last one just today). But grepping stuff is ...

First seen: 2025-04-30 05:26

Last seen: 2025-04-30 17:28