The Privacy Theater of Hashed PII

https://news.ycombinator.com/rss Hits: 1
Summary

About once a year, I’m reminded of the fact that a lot of marketing SaaS and ad tech dresses up cryptographic hashes as a sort of privacy theater. This shows up frequently in product features for suppression lists with the general idea of uploading hashed values of email addresses or phone numbers to enable matching while preserving privacy. The problem is, hashing PII does not protect privacy. The long and short of it is: hashing is only effective if the input data is unbounded. It’s why long and unpredictable passwords are necessary, even with a robust hashing function. PII is neither long nor unpredictable. You can download every baby name going back to 1880 from the Social Security administration. Email addresses follow the format something@something.something. Social Security numbers are 9 digits, so there are at most 1 billion. North American phone numbers are 10 digits, so there are at most 10 billion. Despite this, marketing tools still shuffle around PII hashes of this data. For example, here’s BambooHR: In order to better identify any shared customers we may have, we have decided to compare our customer lists encoded as MD5 Hashes. By encoding our respective customer lists in MD5 Hashes, we will be able to compare customer lists without disclosing any customer info (including customer name). And platforms like UnsubCentral: Manually entering in phone numbers to a suppression tool is a waste of time and resources. Our tool can take plain text and compare it against MD5 or SHA hashed lists of phone numbers – simply throw in the data, and it will do the hard work for you. Everyone is trying to do private set intersection, but doing this with hash-passing is trivially broken on modern consumer hardware. And you don’t even need special password cracking software to break it. On a laptop, we can build a rainbow table of Parquet files for every North American phone number. We can abuse DuckDB as a hashing mill, and generate every MD5 in the 5XX area code block: P...

First seen: 2025-10-26 08:00

Last seen: 2025-10-26 08:00