About

The Qwen3 Embedding 4B W4A16 G128 is an optimized and quantized version of the Qwen3-Embedding-4B model, specifically designed to reduce memory footprint while maintaining high performance. It is well-suited for text embedding tasks, enabling the conversion of text into numerical vectors for applications such as information retrieval, classification, or semantic similarity. Thanks to its advanced quantization, it strikes a strong balance between efficiency and accuracy, with only a minor performance drop of approximately 0.72% on standard benchmarks. This model stands out for its ability to operate with limited hardware resources, significantly reducing VRAM usage compared to the original version. It is particularly valuable for developers seeking to deploy embedded or large-scale AI solutions without compromising result quality.

Documentation

Qwen3-Embedding-4B-W4A16-G128

GPTQ Quantized Qwen/Qwen3-Embedding-4B with THUIR/T2Ranking and m-a-p/COIG-CQIA for calibration set.

What's the benefit?

VRAM Usage: 17430M -> 11000M (w/o FA2).

What's the cost?

~0.72% lost in C-MTEB.

Evaluation performed with official code.

C-MTEB	Param.	Mean(Task)	Mean(Type)	Class.	Clust.	Pair Class.	Rerank.	Retr.	STS
multilingual-e5-large-instruct	0.6B	58.08	58.24	69.80	48.23	64.52	57.45	63.65	45.81
bge-multilingual-gemma2	9B	67.64	68.52	75.31	59.30	86.67	68.28	73.73	55.19
gte-Qwen2-1.5B-instruct	1.5B	67.12	67.79	72.53	54.61	79.5	68.21	71.86	60.05
gte-Qwen2-7B-instruct	7.6B	71.62	72.19	75.77	66.06	81.16	69.24	75.70	65.20
ritrieve_zh_v1	0.3B	72.71	73.85	76.88	66.5	85.98	72.86	76.97	63.92
Qwen3-Embedding-4B	4B	72.27	73.51	75.46	77.89	83.34	66.05	77.03	61.26
This Model	4B-W4A16	71.75	73.05	75.43	77.51	83.04	65.73	76.15	60.47

How to use it?

pip install compressed-tensors optimum and auto-gptq / gptqmodel, then goto the official usage guide.

Capabilities & Tags

sentence-transformerssafetensorsqwen3text-generationtransformerssentence-similarityfeature-extractiontext-embeddings-inferenceendpoints_compatiblecompressed-tensors

Links & Resources