Qwen3: Think Deeper, Act Faster

https://news.ycombinator.com/rss Hits: 13

Summary

QWEN CHAT GitHub Hugging Face ModelScope Kaggle DEMO DISCORDIntroduction#Today, we are excited to announce the release of Qwen3, the latest addition to the Qwen family of large language models. Our flagship model, Qwen3-235B-A22B, achieves competitive results in benchmark evaluations of coding, math, general capabilities, etc., when compared to other top-tier models such as DeepSeek-R1, o1, o3-mini, Grok-3, and Gemini-2.5-Pro. Additionally, the small MoE model, Qwen3-30B-A3B, outcompetes QwQ-32B with 10 times of activated parameters, and even a tiny model like Qwen3-4B can rival the performance of Qwen2.5-72B-Instruct.We are open-weighting two MoE models: Qwen3-235B-A22B, a large model with 235 billion total parameters and 22 billion activated parameters, and Qwen3-30B-A3B, a smaller MoE model with 30 billion total parameters and 3 billion activated parameters. Additionally, six dense models are also open-weighted, including Qwen3-32B, Qwen3-14B, Qwen3-8B, Qwen3-4B, Qwen3-1.7B, and Qwen3-0.6B, under Apache 2.0 license.ModelsLayersHeads (Q / KV)Tie EmbeddingContext LengthQwen3-0.6B2816 / 8Yes32KQwen3-1.7B2816 / 8Yes32KQwen3-4B3632 / 8Yes32KQwen3-8B3632 / 8No128KQwen3-14B4040 / 8No128KQwen3-32B6464 / 8No128KModelsLayersHeads (Q / KV)# Experts (Total / Activated)Context LengthQwen3-30B-A3B4832 / 4128 / 8128KQwen3-235B-A22B9464 / 4128 / 8128KThe post-trained models, such as Qwen3-30B-A3B, along with their pre-trained counterparts (e.g., Qwen3-30B-A3B-Base), are now available on platforms like Hugging Face, ModelScope, and Kaggle. For deployment, we recommend using frameworks like SGLang and vLLM. For local usage, tools such as Ollama, LMStudio, MLX, llama.cpp, and KTransformers are highly recommended. These options ensure that users can easily integrate Qwen3 into their workflows, whether in research, development, or production environments.We believe that the release and open-sourcing of Qwen3 will significantly advance the research and development of large foundation ...

First seen: 2025-04-28 21:21

Last seen: 2025-04-29 09:22

Read Full Article More from this Source

Qwen3: Think Deeper, Act Faster

Summary

Related News

Manuscript of Ismail al-Jazarī's Ingenious Mechanical Devices (ca. 17th century)

Beating the Crowd

Implement Flash Attention Back End in SGLang – Basics and KV Cache

A single line of code cost $8000

Congress passes Take It Down act despite major flaws