Ask HN: How can ChatGPT serve 700M users when I can't run one GPT-4 locally?

https://news.ycombinator.com/rss Hits: 6

Summary

Sam said yesterday that chatgpt handles ~700M weekly users. Meanwhile, I can't even run a single GPT-4-class model locally without insane VRAM or painfully slow speeds.Sure, they have huge GPU clusters, but there must be more going on - model optimizations, sharding, custom hardware, clever load balancing, etc.What engineering tricks make this possible at such massive scale while keeping latency low?Curious to hear insights from people who've built large-scale ML systems.

First seen: 2025-08-08 20:29

Last seen: 2025-08-09 01:31

Read Full Article More from this Source

Ask HN: How can ChatGPT serve 700M users when I can't run one GPT-4 locally?

Summary

Related News

Over engineering my homelab so I don't pay cloud providers

Why building a self-hosted SaaS is harder

HorizonDB, a geocoding engine in Rust that replaces Elasticsearch

Getting Good Results from Claude Code

Telefon Hírmondó: Listen to news and music electronically, in 1893