You have a large JSON file, and you want to load the data into Pydantic. Unfortunately, this uses a lot of memory, to the point where large JSON files are very difficult to read. What to do? Assuming you’re stuck with JSON, in this article we’ll cover: The high memory usage you get with Pydantic’s default JSON loading. How to reduce memory usage by switching to another JSON library. Going further by switching to dataclasses with slots. The problem: 20× memory multiplier We’re going to start with a 100MB JSON file, and load it into Pydantic (v2.11.4). Here’s what our model looks like: from pydantic import BaseModel, RootModel class Name(BaseModel): first: str | None last: str | None class Customer(BaseModel): id: str name: Name notes: str # Map id to corresponding Customer: CustomerDirectory = RootModel[dict[str, Customer]] The JSON we’re loading looks more or less like this: { "123": { "id": "123", "name": { "first": "Itamar", "last": "Turner-Trauring" }, "notes": "Some notes about Itamar" }, # ... etc ... } Pydantic has built-in support for loading JSON, though sadly it doesn’t support reading from a file. So we load the file into a string and then parse it: with open("customers.json", "rb") as f: raw_json = f.read() directory = CustomerDirectory.model_validate_json( raw_json ) This is very straightforward. But there’s a problem. If we measure peak memory usage, it’s using a lot of memory: $ /usr/bin/time -v python v1.py ... Maximum resident set size (kbytes): 2071620 ... That’s around 2000MB of memory, 20× the size of the JSON file. If our JSON file had been 10GB, memory usage would be 200GB, and we’d probably run out of memory. Can we do better? Reducing memory usage There are two fundamental sources of peak memory usage when parsing JSON: The memory used during parsing; many JSON parsers aren’t careful about memory usage, and use more than necessary. The memory used by the final representation, the objects we’re creating. We’ll try to reduce memory usage in each...
First seen: 2025-05-22 18:26
Last seen: 2025-05-23 05:28