by Qwen
Open source · 109k downloads · 51 likes
Qwen3 14B Base is an advanced language model developed by the Qwen team, designed to excel in a wide range of tasks thanks to its optimized architecture and extensive training. With 14.8 billion parameters and trained on 36 trillion tokens across 119 languages, this model stands out for its ability to process long contexts of up to 32,000 tokens, making it particularly well-suited for applications requiring deep comprehension. Its performance is further enhanced by innovative training techniques, such as global batch balancing for MoE-style models and architectural improvements, enabling it to surpass its predecessor, Qwen2.5, in areas like logical reasoning, science, programming, and multilingual understanding. Ideal for developers, researchers, or businesses seeking a versatile tool, it seamlessly integrates into AI pipelines to automate complex tasks or generate precise content. Its three-stage training approach, combined with hyperparameter adjustments guided by scaling laws, ensures greater efficiency and stability, making it a robust choice for professional or academic applications.
Qwen3 is the latest generation of large language models in Qwen series, offering a comprehensive suite of dense and mixture-of-experts (MoE) models. Building upon extensive advancements in training data, model architecture, and optimization techniques, Qwen3 delivers the following key improvements over the previously released Qwen2.5:
Qwen3-14B-Base has the following features:
For more details, including benchmark evaluation, hardware requirements, and inference performance, please refer to our blog, GitHub, and Documentation.
The code of Qwen3 has been in the latest Hugging Face transformers and we advise you to use the latest version of transformers.
With transformers<4.51.0, you will encounter the following error:
KeyError: 'qwen3'
Detailed evaluation results are reported in this 📑 blog.
If you find our work helpful, feel free to give us a cite.
@misc{qwen3technicalreport,
title={Qwen3 Technical Report},
author={Qwen Team},
year={2025},
eprint={2505.09388},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2505.09388},
}