AI/EXPLORER
ToolsCategoriesSitesLLMsCompareAI QuizAlternativesPremium
—AI Tools
—Sites & Blogs
—LLMs & Models
—Categories
AI Explorer

Find and compare the best artificial intelligence tools for your projects.

Made within France

Explore

  • ›All tools
  • ›Sites & Blogs
  • ›LLMs & Models
  • ›Compare
  • ›Chatbots
  • ›AI Images
  • ›Code & Dev

Company

  • ›Premium
  • ›About
  • ›Contact
  • ›Blog

Legal

  • ›Legal notice
  • ›Privacy
  • ›Terms

© 2026 AI Explorer·All rights reserved.

HomeLLMsembeddinggemma 300m qat q8 0 GGUF

embeddinggemma 300m qat q8 0 GGUF

by ggml-org

Open source · 47k downloads · 15 likes

1.5
(15 reviews)EmbeddingAPI & Local
About

The embeddinggemma 300m qat q8 0 GGUF is an embedding model optimized for converting text into dense numerical vectors, streamlining tasks such as semantic search, classification, or clustering. Thanks to its compact size and GGUF format, it delivers an excellent balance between performance and efficiency, making it ideal for local or embedded deployments. Its key capabilities include generating normalizable embeddings according to various norms (L2, L1, etc.), enabling fine-tuned adaptation to specific application needs. The model stands out for its accuracy and speed, even on modest hardware configurations, while remaining accessible through simple tools like a local server or a command-line interface. It is particularly well-suited for developers looking to integrate contextual understanding features into intelligent systems without relying on costly cloud resources.

Documentation

embeddinggemma-300m-qat-q8_0 GGUF

Recommended way to run this model:

Sh
llama-server -hf ggml-org/embeddinggemma-300m-qat-q8_0-GGUF --embeddings

Then the endpoint can be accessed at http://localhost:8080/embedding, for example using curl:

Console
curl --request POST \
    --url http://localhost:8080/embedding \
    --header "Content-Type: application/json" \
    --data '{"input": "Hello embeddings"}' \
    --silent

Alternatively, the llama-embedding command line tool can be used:

Sh
llama-embedding -hf ggml-org/embeddinggemma-300m-qat-q8_0-GGUF --verbose-prompt -p "Hello embeddings"

embd_normalize

When a model uses pooling, or the pooling method is specified using --pooling, the normalization can be controlled by the embd_normalize parameter.

The default value is 2 which means that the embeddings are normalized using the Euclidean norm (L2). Other options are:

  • -1 No normalization
  • 0 Max absolute
  • 1 Taxicab
  • 2 Euclidean/L2
  • >2 P-Norm

This can be passed in the request body to llama-server, for example:

Sh
    --data '{"input": "Hello embeddings", "embd_normalize": -1}' \

And for llama-embedding, by passing --embd-normalize <value>, for example:

Sh
llama-embedding -hf ggml-org/embeddinggemma-300m-qat-q8_0-GGUF  --embd-normalize -1 -p "Hello embeddings"
Capabilities & Tags
sentence-transformersggufsentence-similarityfeature-extractionendpoints_compatible
Links & Resources
Specifications
CategoryEmbedding
AccessAPI & Local
LicenseOpen Source
PricingOpen Source
Rating
1.5

Try embeddinggemma 300m qat q8 0 GGUF

Access the model directly