AI/EXPLORER
OutilsCatégoriesSitesLLMsComparerQuiz IAAlternativesPremium
—Outils IA
—Sites & Blogs
—LLMs & Modèles
—Catégories
AI Explorer

Trouvez et comparez les meilleurs outils d'intelligence artificielle pour vos projets.

Fait avecen France

Explorer

  • ›Tous les outils
  • ›Sites & Blogs
  • ›LLMs & Modèles
  • ›Comparer
  • ›Chatbots
  • ›Images IA
  • ›Code & Dev

Entreprise

  • ›Premium
  • ›À propos
  • ›Contact
  • ›Blog

Légal

  • ›Mentions légales
  • ›Confidentialité
  • ›CGV

© 2026 AI Explorer·Tous droits réservés.

AccueilLLMsChatQwen3 8B Base

Qwen3 8B Base

par Qwen

Open source · 495k downloads · 97 likes

2.5
(97 avis)ChatAPI & Local
À propos

Qwen3 8B Base est un modèle de langage avancé conçu pour comprendre et générer du texte avec une grande précision. Grâce à un entraînement sur un corpus de 36 000 milliards de tokens couvrant 119 langues, il excelle dans des domaines variés comme le raisonnement, la programmation, les sciences ou les connaissances générales. Son architecture optimisée, incluant des techniques comme le *qk layernorm* et un entraînement en trois phases, lui permet de traiter des contextes longs jusqu’à 32 000 tokens tout en maintenant une stabilité et une performance accrues. Idéal pour des applications nécessitant une compréhension approfondie du langage, il se distingue par sa polyvalence et son adaptabilité à des tâches complexes. Que ce soit pour l’automatisation de contenus, l’assistance technique ou l’analyse multilingue, Qwen3 8B Base offre une base solide pour des solutions intelligentes et performantes.

Documentation

Qwen3-8B-Base

Qwen3 Highlights

Qwen3 is the latest generation of large language models in Qwen series, offering a comprehensive suite of dense and mixture-of-experts (MoE) models. Building upon extensive advancements in training data, model architecture, and optimization techniques, Qwen3 delivers the following key improvements over the previously released Qwen2.5:

  • Expanded Higher-Quality Pre-training Corpus: Qwen3 is pre-trained on 36 trillion tokens across 119 languages — tripling the language coverage of Qwen2.5 — with a much richer mix of high-quality data, including coding, STEM, reasoning, book, multilingual, and synthetic data.
  • Training Techniques and Model Architecture: Qwen3 incorporates a series of training techiques and architectural refinements, including global-batch load balancing loss for MoE models and qk layernorm for all models, leading to improved stability and overall performance.
  • Three-stage Pre-training: Stage 1 focuses on broad language modeling and general knowledge acquisition, Stage 2 improves reasoning skills like STEM, coding, and logical reasoning, and Stage 3 enhances long-context comprehension by extending training sequence lengths up to 32k tokens.
  • Scaling Law Guided Hyperparameter Tuning: Through comprehensive scaling law studies across the three-stage pre-training pipeline, Qwen3 systematically tunes critical hyperparameters — such as learning rate scheduler and batch size — separately for dense and MoE models, resulting in better training dynamics and final performance across different model scales.

Model Overview

Qwen3-8B-Base has the following features:

  • Type: Causal Language Models
  • Training Stage: Pretraining
  • Number of Parameters: 8.2B
  • Number of Paramaters (Non-Embedding): 6.95B
  • Number of Layers: 36
  • Number of Attention Heads (GQA): 32 for Q and 8 for KV
  • Context Length: 32,768

For more details, including benchmark evaluation, hardware requirements, and inference performance, please refer to our blog, GitHub, and Documentation.

Requirements

The code of Qwen3 has been in the latest Hugging Face transformers and we advise you to use the latest version of transformers.

With transformers<4.51.0, you will encounter the following error:

VB.NET
KeyError: 'qwen3'

Evaluation & Performance

Detailed evaluation results are reported in this 📑 blog.

Citation

If you find our work helpful, feel free to give us a cite.

INI
@misc{qwen3technicalreport,
      title={Qwen3 Technical Report}, 
      author={Qwen Team},
      year={2025},
      eprint={2505.09388},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2505.09388}, 
}
Liens & Ressources
Spécifications
CatégorieChat
AccèsAPI & Local
LicenceOpen Source
TarificationOpen Source
Paramètres8B parameters
Note
2.5

Essayer Qwen3 8B Base

Accédez directement au modèle