Why Computer-Use Agents Should Think Less

https://news.ycombinator.com/rss Hits: 15

Summary

Teaching GPT-5 to Use a ComputerOver the weekend, I won #3 at OpenAI's GPT-5 Hackathon with Archon - a copilot for your computer. It comes with a mini vision model for speed, and GPT-5 for variable reasoning to plan. I took some time to write about how it works, and our approach to building a self-driving computer with inference math, and the tradeoffs we made.Archon is a small bar that sits at the bottom of your Mac/Windows screen where you can type what you want your computer to do in natural language. It takes screenshots to see what's on screen, uses GPT-5's reasoning to plan, then a custom fine-tuned model executes clicks and keystrokes. In a racing game demo with a single instruction to 'start playing' it recognized the view, used WASD, and navigated the track. Although it didn't win this time due to latency, its instruction-following ability was clearly superior to prior models. The goal is to make a copilot that makes computers self-driving. Archon is a lightweight client demonstrating that GPT-5's powerful reasoning combined with tiny fine-tuned models can control any interface through natural language.Your browser does not support the video tag.Full demo video sped up 2xGPT-5: Why it worked for usArchon was built entirely using GPT-5's advanced reasoning capabilities. We leveraged probably every aspect of GPT-5 from initial development to debugging to training. Codex CLI with GPT-5 with High Thinking enabled us to build the entire app, and GPT-5 with Vision enabled us to see and perceive the screen. GPT-5's reasoning ability was crucial for instruction following, and planning. These all in one model quite simply wasn't possible with any other model.What makes GPT-5 particularly suited for computer control is its ability to reason through complex multi-step processes while maintaining context across long interactions. Unlike previous models that might hallucinate or lose track of the current state, GPT-5's chain-of-thought reasoning allows it to break down ...

First seen: 2025-08-17 17:35

Last seen: 2025-08-18 07:40

Read Full Article More from this Source

Why Computer-Use Agents Should Think Less

Summary

Related News

Microsoft PowerToys

Show HN: TailGuard – Bridge your WireGuard router into Tailscale via a container

The Scam Called "You Don't Have to Remember Anything"

E-Paper Display Refresh Rate Reaches New Heights

PKM apps need to get better at resurfacing information