Microsoft’s “1‑bit” AI model runs on a CPU only, while matching larger systems

https://arstechnica.com/feed/ Hits: 92
Summary

Does size matter? Memory requirements are the most obvious advantage of reducing the complexity of a model's internal weights. The BitNet b1.58 model can run using just 0.4GB of memory, compared to anywhere from 2 to 5GB for other open-weight models of roughly the same parameter size. But the simplified weighting system also leads to more efficient operation at inference time, with internal operations that rely much more on simple addition instructions and less on computationally costly multiplication instructions. Those efficiency improvements mean BitNet b1.58 uses anywhere from 85 to 96 percent less energy compared to similar full-precision models, the researchers estimate. A demo of BitNet b1.58 running at speed on an Apple M2 CPU. A demo of BitNet b1.58 running at speed on an Apple M2 CPU. By using a highly optimized kernel designed specifically for the BitNet architecture, the BitNet b1.58 model can also run multiple times faster than similar models running on a standard full-precision transformer. The system is efficient enough to reach "speeds comparable to human reading (5-7 tokens per second)" using a single CPU, the researchers write (you can download and run those optimized kernels yourself on a number of ARM and x86 CPUs, or try it using this web demo). Crucially, the researchers say these improvements don't come at the cost of performance on various benchmarks testing reasoning, math, and "knowledge" capabilities (although that claim has yet to be verified independently). Averaging the results on several common benchmarks, the researchers found that BitNet "achieves capabilities nearly on par with leading models in its size class while offering dramatically improved efficiency." Despite its smaller memory footprint, BitNet still performs similarly to "full precision" weighted models on many benchmarks. Despite its smaller memory footprint, BitNet still performs similarly to "full precision" weighted models on many benchmarks. Despite the apparent succe...

First seen: 2025-04-18 20:18

Last seen: 2025-04-22 15:41