by mlx-community
Open source · 92k downloads · 7 likes
The Qwen3 Embedding 0.6B 4bit DWQ model is an optimized and lightweight version of the Qwen3 Embedding model, specifically designed to efficiently generate text vector representations (embeddings). Thanks to its reduced size and 4-bit quantization, it strikes a strong balance between performance and resource requirements while maintaining high quality for natural language processing tasks. Its key capabilities include generating embeddings for both short and long texts, suitable for applications such as semantic search, document classification, or text similarity. This model stands out for its lightweight design and compatibility with resource-constrained environments, all while delivering strong performance across various use cases. It is particularly valuable for developers seeking a fast and cost-effective embedding solution without compromising accuracy.
This model mlx-community/Qwen3-Embedding-0.6B-4bit-DWQ was converted to MLX format from Qwen/Qwen3-Embedding-0.6B using mlx-lm version 0.24.1.
pip install mlx-lm
from mlx_lm import load, generate
model, tokenizer = load("mlx-community/Qwen3-Embedding-0.6B-4bit-DWQ")
prompt = "hello"
if tokenizer.chat_template is not None:
messages = [{"role": "user", "content": prompt}]
prompt = tokenizer.apply_chat_template(
messages, add_generation_prompt=True
)
response = generate(model, tokenizer, prompt=prompt, verbose=True)