Codex, Opus, Gemini try to build Counter Strike

https://news.ycombinator.com/rss Hits: 17
Summary

In the last week we’ve had three major model updates: Gemini 3 Pro, Codex Max 5.1, Claude Opus 4.5. We thought we’d give them a challenge: Build a basic version of Counter Strike. The game had to be a 3D UI and it had to be multiplayer. If you're curious, pop open (an ideally large computer screen) and you can try out each model's handiwork yourself: Codex Max 5.1: https://cscodex.vercel.app/ Claude Opus 4.5: https://csclaude.vercel.app/ Gemini 3 Pro: https://csgemini.vercel.app/ We have a full video of us going through the build here, but for those who prefer text, you get this post. We'll go over some of our high-level impressions on each model, then dive deeper into the performance of specific prompts. The Setup We signed up for the highest-tier plan on each model provider and used the defaults set for their CLI. For Codex, that’s 5.1 codex-max on the medium setting. For Claude it’s Opus 4.5. And with Gemini it's 3 pro. We then gave each model about 7 consecutive prompts. Prompts were divided into two categories: Frontend: At first agents only having to worry about the game mechanics. Design the scene, the enemies, the logic for shooting, and some sound effects. Backend: Once that was done agents would then make the game multiplayer. They would need to build be selection of rooms. Users could join them and start shooting. A High-Level Overview So, how'd each model do? In a familiar tune with the other Anthropic models, Opus 4.5 won out on the frontend. It made nicer maps, nicer characters, nicer guns, and generally had the right scene from the get-go. Once the design was done, Gemini 3 Pro started to win in the backend. It got less errors adding multiplayer and persistence. In general Gemini did the best with making logical rather than visual changes. Codex Max felt like an “in-between” model on both frontend and backend. It got a lot of “2nd place” points in our book. It did reasonably well on the frontend and reasonably well on the backend, but felt less spikey...

First seen: 2025-12-01 23:52

Last seen: 2025-12-02 15:54