Intellect-2 Release: The First 32B Model Trained Through Globally Distributed RL

https://news.ycombinator.com/rss Hits: 13

Summary

INTELLECT-2 Release: The First 32B Parameter Model Trained Through Globally Distributed Reinforcement LearningWe're excited to release INTELLECT-2, the first 32B parameter model trained via globally distributed reinforcement learning. Unlike traditional centralized training efforts, INTELLECT-2 trains a reasoning language model using fully asynchronous RL across a dynamic, heterogeneous swarm of permissionless compute contributors.To enable a training run with this unique infrastructure, we built various components from scratch: we introduce PRIME-RL, our training framework purpose-built for distributed asynchronous reinforcement learning, based on top of novel components such as TOPLOC, which verifies rollouts from untrusted inference workers, and SHARDCAST, which efficiently broadcasts policy weights from training nodes to inference workers.Beyond infrastructure components, we propose modifications to the standard GRPO training recipe and data filtering techniques that were crucial to achieve training stability and ensure that our model successfully learned its training objective, thus improving upon QwQ-32B.We open-source INTELLECT-2 along with our code and data, hoping to enable more open research in the field of decentralized trainingParadigm Shift for Decentralized TrainingTest-time compute scaling with reinforcement learning has emerged as a new scaling axis for large language models (LLMs), enabling improvements by allowing models to spend more time reasoning.However, reinforcement learning training is typically centralized, requiring large clusters of co-located GPUs and fast interconnect speeds. With INTELLECT-2, we showcase a paradigm shift: reinforcement learning is inherently more asynchronous and well suited for decentralized, globally distributed compute.Training InfrastructureWe introduce the following key open-source infrastructure components for training INTELLECT-2:PRIME-RL:Fully asynchronous reinforcement learning framework designed for decentral...

First seen: 2025-05-12 03:24

Last seen: 2025-05-12 15:27

Read Full Article More from this Source

Intellect-2 Release: The First 32B Model Trained Through Globally Distributed RL

Summary

Related News

CrowdStrike CEO cuts his voting power by 92% with unexplained gifts

OpenEoX to Standardize End-of-Life (EOL) and End-of-Support (EOS) Information

Embeddings Are Underrated

Continuous Thought Machines

Ruby 3.5 Feature: Namespace on read