Are OpenAI and Anthropic Losing Money on Inference?

https://news.ycombinator.com/rss Hits: 7

Summary

I keep hearing what a cash incinerator AI is, especially around inference. While it seems reasonable on the surface, I've often been wary of these kind of claims, so I decided to do some digging. I haven't seen anyone really try to deconstruct the costs in running inference at scale and the economics really interest me. This is really napkin math. I don't have any experience at running frontier models at scale, but I do know a lot about the costs and economics of running very high throughput services on the cloud and, also, some of the absolutely crazy margins involved from the hyperscalers vs bare metal. Corrections are most welcome. Some assumptions I'm only going to look at raw compute costs. This is obviously a complete oversimplification, but given how useful the current models are - even assuming no improvements - I want to stress test the idea that everyone is losing so much money on inference that it is completely unsustainable. I've taken the cost of a single H100 at $2/hour. This is actually more than the current retail rental on demand price, and I (hope) the large AI firms are able to get these for a fraction of this price. Secondly, I'm going to use the architecture of DeepSeek R1 as the baseline, 671B total params with 37B active via mixture of experts. Given this gets somewhat similar performance to Claude Sonnet 4 and GPT5 I think it's a fair assumption to make. Working Backwards: H100 Math From First Principles Production Setup Let's start with a realistic production setup. I'm assuming a cluster of 72 H100s at $2/hour each, giving us $144/hour in total costs. For production latency requirements, I'm using a batch size of 32 concurrent requests per model instance, which is more realistic than the massive batches you might see in benchmarks. With tensor parallelism across 8 GPUs per model instance, we can run 9 model instances simultaneously across our 72 GPUs. Prefill Phase (Input Processing) The H100 has about 3.35TB/s of HBM bandwidth per GPU, whi...

First seen: 2025-08-28 13:29

Last seen: 2025-08-28 19:30

Read Full Article More from this Source

Are OpenAI and Anthropic Losing Money on Inference?

Summary

Related News

Some thoughts on LLMs and software development

Building your own CLI coding agent with Pydantic-AI

The Unforgotten

Colleges see significant drop in international students as fall semester begins

Optimising for maintainability – Gleam in production at Strand