Generating Pixels One by One

https://news.ycombinator.com/rss Hits: 11
Summary

Your First Autoregressive Image Generation Model We’ll build a basic autoregressive model using a simple MLP to generate images ofhandwritten digits, focusing on understanding the core concept of predicting the next pixel based on its predecessors. It’s a hands-on exploration of fundamental generative AI, showing some of the core concepts using a pretty simple model. The model we will train will be far away from the state-of-the-art, but it will be a good starting point to understand the core concepts of autoregressive models. Welcome, I am glad you are here! I’m Tuna. My world is pretty much all about image and video generation. It is what I focus on in my PhD and during my internships at places like Adobe (working on Firefly!) and Amazon AGI. For a while, I have been working with diffusion-based models, and I know that they are incredibly powerful. But the landscape of generative modeling is always growing, and I want to explore other types of generative models. Right now, I am diving into autoregressive models. I always find the best way to learn a topic is by trying to teach it to others. So, this blog post series is an attempt to teach myself the basics of autoregressive models, hoping you can learn something from it, too. I’ll start with the basics and try to understand how these models work piece by piece. What Makes a Model “Autoregressive”? Alright, “Autoregressive”. Let’s break it down with some mathematical intuition. You have already seen “auto-regressive” models in action even if you didn’t call them that. At its heart, it basically means predicting the next outcome based on all the things that came before it. Think about how you type on your phone. When you write “the weather is …”, the keyboard will suggest completions based on the words you entered such as “sunny”, “rainy”, “perfect for research” (maybe not that last one). That’s an auto-regressive model in action for language. Mathematically, for a sequence (x_1, x_2, …, x_T), an autoregressive mode...

First seen: 2025-06-08 22:16

Last seen: 2025-06-09 08:17