GPT-5-Codex is a better AI researcher than me

https://news.ycombinator.com/rss Hits: 1
Summary

In What’s the strongest AI model you can train on a laptop in five minutes? I tried my hand at answering a silly AI-research question. You can probably guess what it was. I chatted with GPT-5 to help me get started with the Python scripts and to bounce ideas off, but it was still me doing the research. I was coming up with the ideas, running the experiments, and deciding what to do next based on the data. The best model I could train was a 1.8M param transformer which produced output like this: Once upon a time, there was a little boy named Tim. Tim had a small box that he liked to play with. He would push the box to open. One day, he found a big red ball in his yard. Tim was so happy. He picked it up and showed it to his friend, Jane. “Look at my bag! I need it!” she said. They played with the ball all day and had a great time. Since then, OpenAI has released GPT-5-codex, and supposedly uses it (plus Codex, their CLI coding tool) to automate a lot of their product development and AI research. I wanted to try the same thing. Codex-plus-me did a much better job than me alone. Here’s an example of the best output I got from the model I trained with Codex: Once upon a time, in a big forest, there lived a little bunny named Ben. Ben loved to play with his friends in the forest. One day, Ben’s mom saw him and was sad because he couldn’t find his friend. She asked, “Why are you sad, Ben?” Ben said, “I lost my toy. I can’t find it.” Ben wanted to help Ben find his toy. He knew they could fix the toy. He went to Sam’s house and found the toy under a tree. Sam was so happy and said, “Thank you, Ben! You are a very pretty toy!” Ben smiled and said, “Yes, I would love to help you.” They played together all day long. The moral of the story is to help others when they needed it. What was the process like to get there? What vibe research looks like I want to call it “vibe research”. Like “vibe coding”, it’s performing a difficult technical task by relying on the model. I have a b...

First seen: 2025-10-07 13:09

Last seen: 2025-10-07 13:09