Memory optimizations to reduce CPU costs

https://news.ycombinator.com/rss Hits: 5
Summary

Imagine that you are given the following task, with a file like this:Name,Department,Salary,JoinDate John Smith,Marketing,75000,2023-01-15 Alice Johnson,Finance,82000,2022-06-22 Bob Lee,Sales,68000,2024-03-10 Emma Davis,HR,71000,2021-09-01You want to turn that into a single list of all the terms in the (potentially very large) file. In other words, you want to turn it into something like this:[ {"term": "Name", "position": 0, "length": 4}, {"term": "Department", "position": 5, "length": 10}, ... {"term": "2021-09-01", "position": 160, "length": 10} ]In other words, there is a single continuous array that references the entire data, and it is pretty efficient to do so. Why we do that doesn’t actually matter, but the critical aspect is that we observed poor performance and high memory usage when using this approach. Let’s assume that we have a total of 10 million rows, or 40,000,000 items. Each item costs us 24 bytes (8 bytes for the Field, 8 bytes for the Position, 4 bytes for the Length, and 4 bytes for padding). So we end up with about 1GB in memory just to store things. We can use Data-Oriented programming and split the data into individual arrays, like so:public string[] Fields; public long[] Positions; public int[] Lengths; public Item Get(int i) => new(Fields[i], Positions[i], Lengths[i]);This saves us about 200 MB of memory, because we can now skip the padding costs by splitting the Item into its component parts.Now, we didn’t account for the memory costs of the Field strings. And that is because all of them use the same exact string instances (only the field names are stored as strings).In terms of memory usage, that means we don’t have 40 million string instances, but just 4. The next optimization is to reduce the cost of memory even further, like so:public string[] FieldsNames; public byte[] FieldIndexes; public long[] Positions; public int[] Lengths; public Item Get(int i) => new( FieldsNames[FieldIndexes[i]], Positions[i], Lengths[i] );Because we know tha...

First seen: 2025-08-26 06:16

Last seen: 2025-08-26 10:17