INTELLECT-2 Release: The First 32B Parameter Model Trained Through Globally Distributed Reinforcement LearningWe're excited to release INTELLECT-2, the first 32B parameter model trained via globally distributed reinforcement learning. Unlike traditional centralized training efforts, INTELLECT-2 trains a reasoning language model using fully asynchronous RL across a dynamic, heterogeneous swarm of permissionless compute contributors.To enable a training run with this unique infrastructure, we built various components from scratch: we introduce PRIME-RL, our training framework purpose-built for distributed asynchronous reinforcement learning, based on top of novel components such as TOPLOC, which verifies rollouts from untrusted inference workers, and SHARDCAST, which efficiently broadcasts policy weights from training nodes to inference workers.Beyond infrastructure components, we propose modifications to the standard GRPO training recipe and data filtering techniques that were crucial to achieve training stability and ensure that our model successfully learned its training objective, thus improving upon QwQ-32B.We open-source INTELLECT-2 along with our code and data, hoping to enable more open research in the field of decentralized trainingParadigm Shift for Decentralized TrainingTest-time compute scaling with reinforcement learning has emerged as a new scaling axis for large language models (LLMs), enabling improvements by allowing models to spend more time reasoning.However, reinforcement learning training is typically centralized, requiring large clusters of co-located GPUs and fast interconnect speeds. With INTELLECT-2, we showcase a paradigm shift: reinforcement learning is inherently more asynchronous and well suited for decentralized, globally distributed compute.Training InfrastructureWe introduce the following key open-source infrastructure components for training INTELLECT-2:PRIME-RL:Fully asynchronous reinforcement learning framework designed for decentral...
First seen: 2025-05-12 03:24
Last seen: 2025-05-12 15:27