Writing an LLM from scratch, part 22

https://news.ycombinator.com/rss Hits: 14

Summary

Archives Categories Blogroll This post wraps up my notes on chapter 5 of Sebastian Raschka's book "Build a Large Language Model (from Scratch)". Understanding cross entropy loss and perplexity were the hard bits for me in this chapter -- the remaining 28 pages were more a case of plugging bits together and running the code, to see what happens. The shortness of this post almost feels like a damp squib. After writing so much in the last 22 posts, there's really not all that much to say -- but that hides the fact that this part of the book is probably the most exciting to work through. All these pieces developed with such care, and with so much to learn, over the preceding 140 pages, with not all that much to show -- and suddenly, we have a codebase that we can let rip on a training set -- and our model starts talking to us! I trained my model on the sample dataset that we use in the book, the 20,000 characters of "The Verdict" by Edith Wharton, and then ran it to predict next tokens after "Every effort moves you". I got: Every effort moves you in," was down surprise a was one of lo "I quote. Not bad for a model trained on such a small amount of data (in just over ten seconds). The next step was to download the weights for the original 124M-parameter version of GPT-2 from OpenAI, following the instructions in the book, and then to load them into my model. With those weights, against the same prompt, I got this: Every effort moves you as far as the hand can go until the end of your turn unless something interrupts your control flow. As you may observe I That's amazingly cool. Coherent enough that you could believe it's part of the instructions for a game. Now, I won't go through the remainder of the chapter in detail -- as I said, it's essentially just plugging together the various bits that we've gone through so far, even though the results are brilliant. In this post I'm just going to make a few brief notes on the things that I found interesting. Randomness and seedi...

First seen: 2025-10-16 00:44

Last seen: 2025-10-16 13:48

Read Full Article More from this Source

Writing an LLM from scratch, part 22 – training our LLM

Summary

Related News

Formal Reasoning [pdf]

You Already Have a Git Server

ICE Will Use AI to Surveil Social Media

How I turned Zig into my favorite language to write network programs in

Resource use matters, but material footprints are a poor way to measure it