Dummy's Guide to Modern LLM Sampling Intro Knowledge Large Language Models (LLMs) work by taking a piece of text (e.g. user prompt) and calculating the next word. In more technical terms, tokens. LLMs have a vocabulary, or a dictionary, of valid tokens, and will reference those in training and inference (the process of generating text). More on that below. You need to understand why we use tokens (sub-words) instead of words or letters first. But first, a short glossary of some technical terms that aren't explained in the sections below in-depth: Short Glossary Logits: The raw, unnormalized scores output by the model for each token in its vocabulary. Higher logits indicate tokens the model considers more likely to come next. Softmax: A mathematical function that converts logits into a proper probability distribution - values between 0 and 1 that sum to 1. Entropy: A measure of uncertainty or randomness in a probability distribution. Higher entropy means the model is less certain about which token should come next. Perplexity: Related to entropy, perplexity measures how "surprised" the model is by the text. Lower perplexity indicates higher confidence. n-gram: A contiguous sequence of n tokens. For example, "once upon a" is a 3-gram. Context window (or sequence length): The maximum number of tokens an LLM can process at once, including both the prompt and generated output. Probability distribution: A function that assigns probabilities to all possible outcomes (tokens) such that they sum to 1. Think of it like percentages: if 1% was 0.01, 50% was 0.5, and 100% was 1.0. Why tokens? Your first instinct might be using a vocabulary of words or letters for an LLM. But instead, we use sub-words: some common words are preserved as whole in the vocabulary (e.g. the, or apple might be a single token due to how common they are in the English language), but others are fragmented into common sub-words (e.g. bi-fur-cat-ion Why is this? There are several, very good reasons: Why no...
First seen: 2025-05-04 17:49
Last seen: 2025-05-04 20:49