Bayes, bits & brains This site is about probability and information theory. We'll see how they help us understand machine learning and the world around us. A few riddles More about the content, prerequisites, and logistics later. I hope you get a feel for what this is about by checking out the following riddles. I hope some of them nerd-snipe you! 😉 You will understand all of them at the end of this minicourse. Test your intelligence with the following widget! You will be given a bunch of text snippets cut from Wikipedia at a random place. Your job: predict the next letter! Try at least five snippets and compare your performance with some neural nets (GPT-2 and Llama 4).Don't feel bad if a machine beats you; they've been studying for this test their entire lives! But why? And why did Claude Shannon - the information theory GOAT - make this experiment in the 1940s?Hide ▲ 🌐 How much information is on Wikipedia? Onboarding As we go through the mini-course, we'll revisit each puzzle and understand what's going on. But more importantly, we will understand some important pieces of mathematics and get solid theoretical background behind machine learning. Here are some questions we will explore. What's KL divergence, entropy and cross-entropy? What's the intuition behind them? (chapters 1-3) Where do the machine-learning principles of maximum likelihood & maximum entropy come from? (chapters 4-5) Why do we use logits, softmax, and Gaussian all the time? (chapter 5) How to set up loss functions? (chapter 6) How compression works and what intuitions it gives about LLMs? (chapter 7) What's next? This is your last chance. You can go on with your life and believe whatever you want to believe about KL divergence. Or you go to the first chapter and see how far the rabbit-hole goes.
First seen: 2025-09-01 00:46
Last seen: 2025-09-01 13:48