DeepSeek-v3.2-Exp

https://news.ycombinator.com/rss Hits: 8

Summary

Introduction We are excited to announce the official release of DeepSeek-V3.2-Exp, an experimental version of our model. As an intermediate step toward our next-generation architecture, V3.2-Exp builds upon V3.1-Terminus by introducing DeepSeek Sparse Attention—a sparse attention mechanism designed to explore and validate optimizations for training and inference efficiency in long-context scenarios. This experimental release represents our ongoing research into more efficient transformer architectures, particularly focusing on improving computational efficiency when processing extended text sequences. DeepSeek Sparse Attention (DSA) achieves fine-grained sparse attention for the first time, delivering substantial improvements in long-context training and inference efficiency while maintaining virtually identical model output quality. To rigorously evaluate the impact of introducing sparse attention, we deliberately aligned the training configurations of DeepSeek-V3.2-Exp with V3.1-Terminus. Across public benchmarks in various domains, DeepSeek-V3.2-Exp demonstrates performance on par with V3.1-Terminus. Benchmark DeepSeek-V3.1-Terminus DeepSeek-V3.2-Exp Reasoning Mode w/o Tool Use MMLU-Pro 85.0 85.0 GPQA-Diamond 80.7 79.9 Humanity's Last Exam 21.7 19.8 LiveCodeBench 74.9 74.1 AIME 2025 88.4 89.3 HMMT 2025 86.1 83.6 Codeforces 2046 2121 Aider-Polyglot 76.1 74.5 Agentic Tool Use BrowseComp 38.5 40.1 BrowseComp-zh 45.0 47.9 SimpleQA 96.8 97.1 SWE Verified 68.4 67.8 SWE-bench Multilingual 57.8 57.9 Terminal-bench 36.7 37.7 Open-Source Kernels For TileLang kernels with better readability and research-purpose design, please refer to TileLang. For high-performance CUDA kernels, indexer logit kernels (including paged versions) are available in DeepGEMM. Sparse attention kernels are released in FlashMLA. How to Run Locally HuggingFace We provide an updated inference demo code in the inference folder to help the community quickly get started with our model and understand its ...

First seen: 2025-09-29 11:33

Last seen: 2025-09-29 18:34

Read Full Article More from this Source

DeepSeek-v3.2-Exp

Summary

Related News

Formal Reasoning [pdf]

You Already Have a Git Server

ICE Will Use AI to Surveil Social Media

How I turned Zig into my favorite language to write network programs in

Resource use matters, but material footprints are a poor way to measure it