AI/EXPLORER
ToolsCategoriesSitesLLMsCompareAI QuizAlternativesPremium
—AI Tools
—Sites & Blogs
—LLMs & Models
—Categories
AI Explorer

Find and compare the best artificial intelligence tools for your projects.

Made within France

Explore

  • ›All tools
  • ›Sites & Blogs
  • ›LLMs & Models
  • ›Compare
  • ›Chatbots
  • ›AI Images
  • ›Code & Dev

Company

  • ›Premium
  • ›About
  • ›Contact
  • ›Blog

Legal

  • ›Legal notice
  • ›Privacy
  • ›Terms

© 2026 AI Explorer·All rights reserved.

HomeLLMsQwen3 Embedding 4B W4A16 G128

Qwen3 Embedding 4B W4A16 G128

by boboliu

Open source · 190k downloads · 5 likes

1.0
(5 reviews)EmbeddingAPI & Local
About

The Qwen3 Embedding 4B W4A16 G128 is an optimized and quantized version of the Qwen3-Embedding-4B model, specifically designed to reduce memory footprint while maintaining high performance. It is well-suited for text embedding tasks, enabling the conversion of text into numerical vectors for applications such as information retrieval, classification, or semantic similarity. Thanks to its advanced quantization, it strikes a strong balance between efficiency and accuracy, with only a minor performance drop of approximately 0.72% on standard benchmarks. This model stands out for its ability to operate with limited hardware resources, significantly reducing VRAM usage compared to the original version. It is particularly valuable for developers seeking to deploy embedded or large-scale AI solutions without compromising result quality.

Documentation

Qwen3-Embedding-4B-W4A16-G128

GPTQ Quantized Qwen/Qwen3-Embedding-4B with THUIR/T2Ranking and m-a-p/COIG-CQIA for calibration set.

What's the benefit?

VRAM Usage: 17430M -> 11000M (w/o FA2).

What's the cost?

~0.72% lost in C-MTEB.

Evaluation performed with official code.

C-MTEBParam.Mean(Task)Mean(Type)Class.Clust.Pair Class.Rerank.Retr.STS
multilingual-e5-large-instruct0.6B58.0858.2469.8048.2364.5257.4563.6545.81
bge-multilingual-gemma29B67.6468.5275.3159.3086.6768.2873.7355.19
gte-Qwen2-1.5B-instruct1.5B67.1267.7972.5354.6179.568.2171.8660.05
gte-Qwen2-7B-instruct7.6B71.6272.1975.7766.0681.1669.2475.7065.20
ritrieve_zh_v10.3B72.7173.8576.8866.585.9872.8676.9763.92
Qwen3-Embedding-4B4B72.2773.5175.4677.8983.3466.0577.0361.26
This Model4B-W4A1671.7573.0575.4377.5183.0465.7376.1560.47

How to use it?

pip install compressed-tensors optimum and auto-gptq / gptqmodel, then goto the official usage guide.

Capabilities & Tags
sentence-transformerssafetensorsqwen3text-generationtransformerssentence-similarityfeature-extractiontext-embeddings-inferenceendpoints_compatiblecompressed-tensors
Links & Resources
Specifications
CategoryEmbedding
AccessAPI & Local
LicenseOpen Source
PricingOpen Source
Parameters4B parameters
Rating
1.0

Try Qwen3 Embedding 4B W4A16 G128

Access the model directly