GPT-OSS vs. Qwen3 and a detailed look how things evolved since GPT-2

https://news.ycombinator.com/rss Hits: 23
Summary

OpenAI just released their new open-weight LLMs this week: gpt-oss-120b and gpt-oss-20b, their first open-weight models since GPT-2 in 2019. And yes, thanks to some clever optimizations, they can run locally (but more about this later).This is the first time since GPT-2 that OpenAI has shared a large, fully open-weight model. Earlier GPT models showed how the transformer architecture scales. The 2022 ChatGPT release then made these models mainstream by demonstrating concrete usefulness for writing and knowledge (and later coding) tasks. Now they have shared some long-awaited weight model, and the architecture has some interesting details.I spent the past few days reading through the code and technical reports to summarize the most interesting details. (Just days after, OpenAI also announced GPT-5, which I will briefly discuss in the context of the gpt-oss models at the end of this article.)Below is a quick preview of what the article covers. For easier navigation, I recommend using the Table of Contents on the left of on the article page.Model architecture comparisons with GPT-2MXFP4 optimization to fit gpt-oss models onto single GPUsWidth versus depth trade-offs (gpt-oss vs Qwen3)Attention bias and sinksBenchmarks and comparisons with GPT-5I hope you find it informative!Before we discuss the architecture in more detail, let's start with an overview of the two models, gpt-oss-20b and gpt-oss-120b, shown in Figure 1 below.Figure 1: The two gpt-oss models side by side.If you have looked at recent LLM architecture diagrams before, or read my previous Big Architecture Comparison article, you may notice that there is nothing novel or unusual at first glance. This is not surprising, since leading LLM developers tend to use the same base architecture and then apply smaller tweaks. This is pure speculation on my part, but I think this is becauseThere is significant rotation of employees between these labs.We still have not found anything better than the transformer architec...

First seen: 2025-08-10 16:43

Last seen: 2025-08-11 14:49