As “leaked”, OpenAI cut o3 pricing by 80% today (from $10/$40 per mtok to $2/$8 - matching GPT 4.1 pricing!!) to set the stage of the launch of o3-pro ($20/$80, supporting an unverified community theory that the -pro variants are 10x base model calls with majority voting as referenced in their papers and in our Chai episode). o3-pro reports a 64% win rate vs o3 on human testers and does marginally better on 4/4 reliability benchmarks, but as sama noticed, the actual experience expands when you test it DIFFERENTLY…I’ve had early access to o3 pro for the past week. below are my (early) thoughts:We’re in the era of task-specific models. On one hand, we have “normal” models like 3.5 Sonnet and 4o—the ones we talk to like friends, who help us with our writing, and answer our day-to-day queries. On the other, we have gigantic, slow, expensive, IQ-maxxing reasoning models that we go to for deep analysis (they’re great at criticism), one-shotting complex problems, and pushing the edge of pure intelligence.If you follow me on Twitter, you know I've had a journey with the o-reasoning models. My first impression of o1/o1-pro was quite negative. But as I gritted my teeth through the first weeks, propelled by other people's raving reviews, I realized that I was, in fact, using it wrong. I wrote up all my thoughts, got ratio’ed by @sama, and quote-tweeted by @gdb.The key, I discovered, was to not chat with it. Instead, treat it like a report generator. Give it context, give it a goal, and let it rip. And that's exactly how I use o3 today.But therein lines the problem with evaluating o3 pro.It’s smarter. much smarter.But in order to see that, you need to give it a lot more context. and I’m running out of context.There was no simple test or question i could ask it that blew me away.But then I took a different approach. My co-founder Alexis and I took the the time to assemble a history of all of our past planning meetings at Raindrop, all of our goals, even record voice memos: and t...
First seen: 2025-06-12 16:49
Last seen: 2025-06-13 12:52