embeddinggemma-300m-qat-q8_0 GGUF

Recommended way to run this model:

llama-server -hf ggml-org/embeddinggemma-300m-qat-q8_0-GGUF --embeddings

Then the endpoint can be accessed at http://localhost:8080/embedding, for example using curl:

Console

curl --request POST \
    --url http://localhost:8080/embedding \
    --header "Content-Type: application/json" \
    --data '{"input": "Hello embeddings"}' \
    --silent

Alternatively, the llama-embedding command line tool can be used:

llama-embedding -hf ggml-org/embeddinggemma-300m-qat-q8_0-GGUF --verbose-prompt -p "Hello embeddings"

embd_normalize

When a model uses pooling, or the pooling method is specified using --pooling, the normalization can be controlled by the embd_normalize parameter.

The default value is 2 which means that the embeddings are normalized using the Euclidean norm (L2). Other options are:

-1 No normalization
0 Max absolute
1 Taxicab
2 Euclidean/L2
>2 P-Norm

This can be passed in the request body to llama-server, for example:

    --data '{"input": "Hello embeddings", "embd_normalize": -1}' \

And for llama-embedding, by passing --embd-normalize <value>, for example:

llama-embedding -hf ggml-org/embeddinggemma-300m-qat-q8_0-GGUF  --embd-normalize -1 -p "Hello embeddings"

embeddinggemma-300m-qat-q8_0 GGUF

Recommended way to run this model:

llama-server -hf ggml-org/embeddinggemma-300m-qat-q8_0-GGUF --embeddings

Then the endpoint can be accessed at http://localhost:8080/embedding, for example using curl:

Console

curl --request POST \
    --url http://localhost:8080/embedding \
    --header "Content-Type: application/json" \
    --data '{"input": "Hello embeddings"}' \
    --silent

Alternatively, the llama-embedding command line tool can be used:

llama-embedding -hf ggml-org/embeddinggemma-300m-qat-q8_0-GGUF --verbose-prompt -p "Hello embeddings"

embd_normalize

When a model uses pooling, or the pooling method is specified using --pooling, the normalization can be controlled by the embd_normalize parameter.

The default value is 2 which means that the embeddings are normalized using the Euclidean norm (L2). Other options are:

-1 No normalization

0 Max absolute

1 Taxicab

2 Euclidean/L2

>2 P-Norm

This can be passed in the request body to llama-server, for example:

    --data '{"input": "Hello embeddings", "embd_normalize": -1}' \

And for llama-embedding, by passing --embd-normalize <value>, for example:

llama-embedding -hf ggml-org/embeddinggemma-300m-qat-q8_0-GGUF  --embd-normalize -1 -p "Hello embeddings"

embeddinggemma 300m qat q8 0 GGUF