馃摪 Tech Blog | 馃搫 Paper Link (coming soon) 1. Model Introduction Kimi K2 is a state-of-the-art mixture-of-experts (MoE) language model with 32 billion activated parameters and 1 trillion total parameters. Trained with the Muon optimizer, Kimi K2 achieves exceptional performance across frontier knowledge, reasoning, and coding tasks while being meticulously optimized for agentic capabilities. Key Features Large-Scale Training: Pre-trained a 1T parameter MoE model on 15.5T tokens with zero training instability. MuonClip Optimizer: We apply the Muon optimizer to an unprecedented scale, and develop novel optimization techniques to resolve instabilities while scaling up. Agentic Intelligence: Specifically designed for tool use, reasoning, and autonomous problem-solving. Model Variants Kimi-K2-Base : The foundation model, a strong start for researchers and builders who want full control for fine-tuning and custom solutions. : The foundation model, a strong start for researchers and builders who want full control for fine-tuning and custom solutions. Kimi-K2-Instruct: The post-trained model best for drop-in, general-purpose chat and agentic experiences. It is a reflex-grade model without long thinking. 2. Model Summary Architecture Mixture-of-Experts (MoE) Total Parameters 1T Activated Parameters 32B Number of Layers (Dense layer included) 61 Number of Dense Layers 1 Attention Hidden Dimension 7168 MoE Hidden Dimension (per Expert) 2048 Number of Attention Heads 64 Number of Experts 384 Selected Experts per Token 8 Number of Shared Experts 1 Vocabulary Size 160K Context Length 128K Attention Mechanism MLA Activation Function SwiGLU 3. Evaluation Results Instruction model evaluation results Benchmark Metric Kimi K2 Instruct DeepSeek-V3-0324 Qwen3-235B-A22B (non-thinking) Claude Sonnet 4 (w/o extended thinking) Claude Opus 4 (w/o extended thinking) GPT-4.1 Gemini 2.5 Flash Preview (05-20) Coding Tasks LiveCodeBench v6 (Aug 24 - May 25) Pass@1 53.7 46.9 37.0 48.5 47.4 44.7 44.7...
First seen: 2025-07-12 18:51
Last seen: 2025-07-13 06:53