stochasticchasm ~/ ~/blog ~/projects ~/bookmarks Contextualization Machines March 10, 2025 Introduction This post is meant to be an illustration of my mental model of a transformer, a sort of synthesis of a bunch of thoughts and ideas I鈥檝e had over the past few months. It assumes knowledge on what a transformer architecture is. I see a lot of people say transformers are next-token predictors. And while that鈥檚 true for how LLMs and other models may work, I feel like that still doesn鈥檛 really give you a good mental model on how the transformer operates. Over time, I鈥檝e developed a mental model that helps me make sense of transformer behavior: I view them fundamentally as contextualization machines. After all, next-token prediction is a learning objective, not an architecture. That said, I work with LLMs mainly so this post will still have a big focus on LLMs particularly (decoder-only architecture and all that). As we go through the post, we鈥檒l examine each component of a transformer through the lens of contextualization. I鈥檒l illustrate the mental model by showing how it helps frame different research results and papers. When I talk about contextualization, I mean contextualization of tokens and hidden states. One view of the decoder-only transformer that I find useful is to think of the residual chain as the main backbone of the model and the layers as additive transformations, instead of thinking of the main flow of states through the layers as the backbone and the residuals as anchoring states or something along those lines. Here鈥檚 a diagram to illustrate what I mean when I say this. In a sense, each layer鈥檚 transformation of the hidden states can be viewed as a contextualization operation to the embedding, and then that contextualization is added back onto the token representation. If you graph out correlations (as cosine similarity) of hidden states between layers, you鈥檒l see that the hidden states after each layer are pretty similar to the hidden states before ...
First seen: 2025-12-04 16:12
Last seen: 2025-12-04 19:13