Ask HN: How can ChatGPT serve 700M users when I can't run one GPT-4 locally?

https://news.ycombinator.com/rss Hits: 6
Summary

Sam said yesterday that chatgpt handles ~700M weekly users. Meanwhile, I can't even run a single GPT-4-class model locally without insane VRAM or painfully slow speeds.Sure, they have huge GPU clusters, but there must be more going on - model optimizations, sharding, custom hardware, clever load balancing, etc.What engineering tricks make this possible at such massive scale while keeping latency low?Curious to hear insights from people who've built large-scale ML systems.

First seen: 2025-08-08 20:29

Last seen: 2025-08-09 01:31