by ggml-org
Open source · 47k downloads · 15 likes
The embeddinggemma 300m qat q8 0 GGUF is an embedding model optimized for converting text into dense numerical vectors, streamlining tasks such as semantic search, classification, or clustering. Thanks to its compact size and GGUF format, it delivers an excellent balance between performance and efficiency, making it ideal for local or embedded deployments. Its key capabilities include generating normalizable embeddings according to various norms (L2, L1, etc.), enabling fine-tuned adaptation to specific application needs. The model stands out for its accuracy and speed, even on modest hardware configurations, while remaining accessible through simple tools like a local server or a command-line interface. It is particularly well-suited for developers looking to integrate contextual understanding features into intelligent systems without relying on costly cloud resources.
Recommended way to run this model:
llama-server -hf ggml-org/embeddinggemma-300m-qat-q8_0-GGUF --embeddings
Then the endpoint can be accessed at http://localhost:8080/embedding, for
example using curl:
curl --request POST \
--url http://localhost:8080/embedding \
--header "Content-Type: application/json" \
--data '{"input": "Hello embeddings"}' \
--silent
Alternatively, the llama-embedding command line tool can be used:
llama-embedding -hf ggml-org/embeddinggemma-300m-qat-q8_0-GGUF --verbose-prompt -p "Hello embeddings"
When a model uses pooling, or the pooling method is specified using --pooling,
the normalization can be controlled by the embd_normalize parameter.
The default value is 2 which means that the embeddings are normalized using
the Euclidean norm (L2). Other options are:
This can be passed in the request body to llama-server, for example:
--data '{"input": "Hello embeddings", "embd_normalize": -1}' \
And for llama-embedding, by passing --embd-normalize <value>, for example:
llama-embedding -hf ggml-org/embeddinggemma-300m-qat-q8_0-GGUF --embd-normalize -1 -p "Hello embeddings"