by Qwen
Open source · 1M downloads · 84 likes
The Qwen3 4B Base model is a streamlined version of the Qwen3 family, designed to deliver advanced natural language processing capabilities. Trained on a corpus of 36 trillion tokens across 119 languages, it excels in comprehension, text generation, and reasoning, with a particular focus on STEM fields, programming, and multilingual data. Its optimized architecture, incorporating techniques like *qk layernorm* and a three-phase training process, enables it to handle long contexts of up to 32,000 tokens while maintaining high stability. Ideal for applications requiring deep analysis or nuanced text production, it stands out for its versatility and efficiency, even on complex tasks. This model is particularly well-suited for developers, researchers, or businesses seeking a high-performance solution without relying on heavier models.
Qwen3 is the latest generation of large language models in Qwen series, offering a comprehensive suite of dense and mixture-of-experts (MoE) models. Building upon extensive advancements in training data, model architecture, and optimization techniques, Qwen3 delivers the following key improvements over the previously released Qwen2.5:
Qwen3-4B-Base has the following features:
For more details, including benchmark evaluation, hardware requirements, and inference performance, please refer to our blog, GitHub, and Documentation.
The code of Qwen3 has been in the latest Hugging Face transformers and we advise you to use the latest version of transformers.
With transformers<4.51.0, you will encounter the following error:
KeyError: 'qwen3'
Detailed evaluation results are reported in this 📑 blog.
If you find our work helpful, feel free to give us a cite.
@misc{qwen3technicalreport,
title={Qwen3 Technical Report},
author={Qwen Team},
year={2025},
eprint={2505.09388},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2505.09388},
}