Qwen3 Embedding 0.6B 4bit DWQ

Name: Qwen3 Embedding 0.6B 4bit DWQ
Rating: 1.1 (7 reviews)

by mlx-community

Open source · 92k downloads · 7 likes

1.1

(7 reviews)ChatAPI & Local

About

The Qwen3 Embedding 0.6B 4bit DWQ model is an optimized and lightweight version of the Qwen3 Embedding model, specifically designed to efficiently generate text vector representations (embeddings). Thanks to its reduced size and 4-bit quantization, it strikes a strong balance between performance and resource requirements while maintaining high quality for natural language processing tasks. Its key capabilities include generating embeddings for both short and long texts, suitable for applications such as semantic search, document classification, or text similarity. This model stands out for its lightweight design and compatibility with resource-constrained environments, all while delivering strong performance across various use cases. It is particularly valuable for developers seeking a fast and cost-effective embedding solution without compromising accuracy.

Documentation

mlx-community/Qwen3-Embedding-0.6B-4bit-DWQ

This model mlx-community/Qwen3-Embedding-0.6B-4bit-DWQ was converted to MLX format from Qwen/Qwen3-Embedding-0.6B using mlx-lm version 0.24.1.

Use with mlx

Bash

pip install mlx-lm

Python

from mlx_lm import load, generate

model, tokenizer = load("mlx-community/Qwen3-Embedding-0.6B-4bit-DWQ")

prompt = "hello"

if tokenizer.chat_template is not None:
    messages = [{"role": "user", "content": prompt}]
    prompt = tokenizer.apply_chat_template(
        messages, add_generation_prompt=True
    )

response = generate(model, tokenizer, prompt=prompt, verbose=True)

Capabilities & Tags

mlxsafetensorsqwen3text-generationtransformerssentence-transformerssentence-similarityfeature-extractionconversationaltext-generation-inference

Links & Resources