Watching 03 Model Sweat over a Paul Morphy Mate-in-2

https://news.ycombinator.com/rss Hits: 7
Summary

When I gave OpenAI’s 03 model a tough chess puzzle, it behaved almost like a human: thinking, doubting, retrying, and finally googling the answer. 🤣 Before I break it down step-by-step, here’s the funniest part: it spent 8 minutes calculating and pixel-measuring squares… and then cheated by using Bing. How 03 Solves hard Chess Problems First the Problem if you want you can try to solve it yourself Chess Puzzle Checkmate in 2 White This is a famous chess puzzle from Paul Morphy check out this link for more. See the full conversation here. 1. Careful Board Analysis Careful Board Analysis First, 03 spent several minutes carefully reconstructing the board based on the image. “The board has White’s King on c8, White’s Rook on a1, and a Pawn on b6. Black’s King on a8, Bishop on b8, and Pawns on a7 and b7.” It was being very cautious. It wanted to fully understand the position before trying any move. Already a very agent like behavior: care first, then move. 2. First Ideas and Doubts It tested obvious ideas like b7# and Rxa7#, but immediately started doubting itself. “After 1. Rxa7+, Black’s King cannot capture because of the White pawn on b6." "But after Bxa7, it is no longer check. So it is not mate." "1. b7+ is illegal because the pawn on b7 blocks it.” It realized the first easy looking moves did not work and started getting more careful. 3. Attempt to Use Python When pure reasoning was not enough, 03 tried programming its way out of the situation. “I should probably check using something like a chess engine to confirm.” (tries to import chess module, but fails: “ModuleNotFoundError”). Attempt to Use Python It wanted to run a simulation, but of course, it had no real chess engine installed. 4. Pixel-by-Pixel Image Analysis Still stuck, it tried manually measuring the chessboard squares. “The board is 726 pixels high, each square is about 88 pixels…" "Coordinates for b7 would be approximately x=88–176, y=88–176.” It was literally trying to deduce the pieces based on pix...

First seen: 2025-04-27 18:16

Last seen: 2025-04-28 00:17