Spatio-temporal indexing the Bluesky firehose

https://news.ycombinator.com/rss Hits: 4
Summary

Spatio-temporal indexing the BlueSky firehose Joel Gustafson / Posts / 2025-08-07 I recently added a "spatial feed" to Aurora, my map of Bluesky. Now, in addition to seeing community clusters laid out on a giant map, you can also see a real-time of posts from just the accounts currently in view. This works smoothly at all scales — you can see the most recent posts from the entire network when zoomed all the way out, and local posts from any neighborhood when zoomed in. How does this work? This is actually the first (and only) backend service that I've had to deploy for this project. To compute the clustering and layout for the map, I index the follow graph in a SQLite database that only lives on my home server, do all the data processing locally, and just push static assets to a Cloudflare R2 bucket at the end that the web client fetches directly. check out my previous post about building Aurora using WebGPU and UMAP! But adding spatial feeds means having the web client make constant queries for post URIs in arbitrary map areas, which it can then "hydrate" into post content from the Bluesky API directly. I didn't want to expose public ports from my home server, so that means deploying a firehose consumer to the cloud. What does this firehose consumer need to do? It receives a stream of posts via WebSocket from a Jetstream endpoint, and needs to index them in some way that supports arbitrary spatial queries. We can give the indexer access to a local SQLite database with the current map coordinates of each user, which only changes when I release montly snapshots of the map. This means we just need to implement a simple observe/query interface. type Post = { id: number x: number y: number } type Area = { minX: number maxX: number minY: number maxY: number } class Indexer { public constructor(bounds: Area) { } public observe(post: Post): void { } public query(area: Area, limit: number): Post[] { } } Furthermore, we would like the resulting Post[] array to be the most re...

First seen: 2025-08-07 22:25

Last seen: 2025-08-08 01:26