The Big LLM Architecture Comparison

https://news.ycombinator.com/rss Hits: 6

Summary

It has been seven years since the original GPT architecture was developed. At first glance, looking back at GPT-2 (2019) and forward to DeepSeek-V3 and Llama 4 (2024-2025), one might be surprised at how structurally similar these models still are.Sure, positional embeddings have evolved from absolute to rotational (RoPE), Multi-Head Attention has largely given way to Grouped-Query Attention, and the more efficient SwiGLU has replaced activation functions like GELU. But beneath these minor refinements, have we truly seen groundbreaking changes, or are we simply polishing the same architectural foundations?Comparing LLMs to determine the key ingredients that contribute to their good (or not-so-good) performance is notoriously challenging: datasets, training techniques, and hyperparameters vary widely and are often not well documented.However, I think that there is still a lot of value in examining the structural changes of the architectures themselves to see what LLM developers are up to in 2025. (A subset of them are shown in Figure 1 below.)Figure 1: A subset of the architectures covered in this article.So, in this article, rather than writing about benchmark performance or training algorithms, I will focus on the architectural developments that define today's flagship open models.(As you may remember, I wrote about multimodal LLMs not too long ago; in this article, I will focus on the text capabilities of recent models and leave the discussion of multimodal capabilities for another time.)Tip: This is a fairly comprehensive article, so I recommend using the navigation bar to access the table of contents (just hover over the left side of the Substack page).As you have probably heard more than once by now, DeepSeek R1 made a big impact when it was released in January 2025. DeepSeek R1 is a reasoning model built on top of the DeepSeek V3 architecture, which was introduced in December 2024.While my focus here is on architectures released in 2025, I think it’s reasonable...

First seen: 2025-07-20 08:31

Last seen: 2025-07-20 13:32

Read Full Article More from this Source

The Big LLM Architecture Comparison

Summary

Related News

The Secrets We Keep

How Tesla is proving doubters right on why its robotaxi service cannot scale

Scientists reveal a widespread but unidentified psychological phenomenon

Terence Tao: A human metaphor for evaluating AI capability

I tried Vibe coding in BASIC and it didn't go well