The transformer architecture behind today’s large language models has shown an uncanny ability to generate human-like text. Part of its effectiveness comes from its self-attention mechanism, which allows the model to weigh all the words in an input sequence when generating a response.The problem comes as conversations get longer. Because the model holds the running sequence in memory as it responds, the cumulative cost of generation grows quadratically. If the size of the context window doubles, the cost of processing the context and generating a response doesn’t just double — it quadruples.This “quadratic bottleneck” is often behind that frustrating lag between asking the model a question and getting an answer. It also creates a lot of redundant computing. By the time ChatGPT popularized the transformer in 2022, researchers were already searching for alternative architectures.State-space models (SSMs), and transformers interleaved with SSM layers, have emerged as two possible solutions. IBM Research has just open-sourced its first hybrid experiment: Bamba, a model that can run as quickly as an SSM and process long sequences as skillfully as a transformer. Many of Bamba’s innovations are part of IBM’s next-generation Granite 4.0 models coming in several months.By significantly reducing the memory requirements of the transformer’s KV (key value) cache memory, Bamba-9B has shown it can run at least twice as fast as transformers of similar size while matching their accuracy. “Everything comes back to the KV cache reduction,” says Raghu Ganti, the IBM researcher leading the project. “More throughput, lower latency, longer context length.”The most important model you’ve never heard ofState-space models come nowhere close to matching the name recognition of transformers, but they’ve been used for decades to model dynamic systems.“They are the bread and butter of electrical engineering — signal processing, robotics, and control theory,” says Ankit Gupta, an IBM researcher ...
First seen: 2025-04-29 18:24
Last seen: 2025-04-29 23:25