What’s the strongest model I can train on my MacBook Pro in five minutes? I’ll give the answer upfront: the best 5-minute model I could train was a ~1.8M-param GPT-style transformer trained on ~20M TinyStories tokens, reaching ~9.6 perplexity on a held-out split. Here’s an example of the output, with the prompt bolded: Once upon a time, there was a little boy named Tim. Tim had a small box that he liked to play with. He would push the box to open. One day, he found a big red ball in his yard. Tim was so happy. He picked it up and showed it to his friend, Jane. “Look at my bag! I need it!” she said. They played with the ball all day and had a great time. OK, so it’s not great. But it’s not bad for five minutes! The challenge I’ve been interested in this silly question for a few days. It’s a silly question for two reasons. First, anyone who can afford a MacBook can afford to rent half an hour on a H100 and train a model that’s several orders of magnitude more powerful. Second, if you were forced to train on a weaker device like a laptop, there’s no reason to limit yourself to five minutes (and no reason to think it would even be possible to train a strong model in that time). Other training challenges like BabyLM restrict the training data, which makes sense - some domains might have very little data, so it’s useful to know how you can most effectively train a model when data is scarce. It’s also a popular research goal to try and train the smallest strong model, which also makes sense, since you can run small models on phones and portable devices. But I can use as much training data as I want, and as large of a model as I want. My main limitation is time. In five minutes, you just can’t push that many tokens through a model. That means that large models are out of the question, since it takes longer per-token to train a larger model. Better to train a 1M param model on 4M tokens than a 1B param model on 4,000 tokens. But of course you can’t go too small. In five minu...
First seen: 2025-08-14 11:15
Last seen: 2025-08-15 02:18