SmolLM3: smol, multilingual, long-context reasoner Small language models are becoming increasingly important as users seek capable models that can be deployed efficiently. The community has produced a fascinating range of capable small models, each pushing the boundaries of what's possible at this scale. With SmolLM3, we're excited to contribute a new competitive fully open 3B model: SmolLM3 sits in the efficiency sweet spot. Our 3B model outperforms Llama-3.2-3B and Qwen2.5-3B while staying competitive with larger 4B alternatives (Qwen3 & Gemma3). Beyond the performance numbers, we're sharing exactly how we built it using public datasets and training frameworks. Model summary: 3B model trained on 11T tokens, SoTA at the 3B scale and competitive with 4B models Instruct model with dual mode reasoning, supporting think/no_think modes Multilingual support for 6 languages: English, French, Spanish, German, Italian, and Portuguese Long context up to 128k with NoPE and using YaRN The complete recipe: We're releasing SmolLM3 with our engineering blueprint. It includes architecture details, exact data mixtures showing how we progressively boost performance across domains in a three-stage pretraining approach, and the methodology for building a hybrid reasoning model. Usually, achieving these results would require months of reverse engineering. Instead, we're providing the full methodology. Whether you're building your own models or want to understand what drives performance at this scale, this blueprint shows the engineering story behind competitive 3B performance. Let’s have a look at the pretraining stage. Pretraining SmolLM3 both changed the architecture and data mixture over its predecessors. Let’s have a look at the architecture and training configurations first! Architecture and training details SmolLM3 follows a transformer decoder architecture with tied embedding similar to SmolLM2, building on Llama architecture with some key modifications optimized for efficiency ...
First seen: 2025-07-08 16:31
Last seen: 2025-07-08 19:31