nomic embed text v2 moe distilled high quality

par cnmoro

Open source · 42k downloads · 4 likes

0.9

(4 avis)EmbeddingAPI & Local

À propos

Ce modèle, nommé *nomic embed text v2 moe distilled high quality*, est une version optimisée et distillée du modèle *nomic-embed-text-v2-moe*, conçue pour générer des embeddings textuels de haute qualité. Il transforme des textes en vecteurs denses de 768 dimensions, capturant finement leur sémantique pour des tâches de recherche, de classification ou de similarité. Son processus de distillation, basé sur l'entraînement avec 23 millions de triplets de données, améliore ses performances tout en réduisant sa complexité, le rendant plus accessible et efficace. Ses principaux cas d'usage incluent la recherche d'information, l'analyse de documents ou la comparaison de contenus textuels, où sa capacité à produire des représentations précises et contextualisées est un atout majeur. Ce qui le distingue, c'est sa méthode de distillation innovante, combinant l'approche *Model2Vec* et l'entraînement sur des données massives, garantissant un équilibre entre performance et efficacité.

Documentation

This Model2Vec model was created by using Tokenlearn, with nomic-embed-text-v2-moe as a base.

The output dimension is 768.

The evaluation in the model card, was executed using this model (distilled), not the original.

The process to create this one, was not a simple model2vec distill, this involved generating embeddings for 23M triplets (msmarco) with the original model, then training the tokenlearn model on it, with the nomic model as a base.

Usage

Load this model using model2vec library:

Python

from model2vec import StaticModel

model = StaticModel.from_pretrained("cnmoro/nomic-embed-text-v2-moe-distilled-high-quality")

# Compute text embeddings
embeddings = model.encode(["Example sentence"])

Or using sentence-transformers library:

Python

from sentence_transformers import SentenceTransformer

model = SentenceTransformer('cnmoro/nomic-embed-text-v2-moe-distilled-high-quality')

# Compute text embeddings
embeddings = model.encode(["Example sentence"])

Liens & Ressources

nomic embed text v2 moe distilled high quality

par cnmoro

Open source · 42k downloads · 4 likes

0.9

(4 avis)EmbeddingAPI & Local

À propos

Documentation

This Model2Vec model was created by using Tokenlearn, with nomic-embed-text-v2-moe as a base.

The output dimension is 768.

The evaluation in the model card, was executed using this model (distilled), not the original.

Usage

Load this model using model2vec library:

Python

from model2vec import StaticModel

model = StaticModel.from_pretrained("cnmoro/nomic-embed-text-v2-moe-distilled-high-quality")

# Compute text embeddings
embeddings = model.encode(["Example sentence"])

Or using sentence-transformers library:

Python

from sentence_transformers import SentenceTransformer

model = SentenceTransformer('cnmoro/nomic-embed-text-v2-moe-distilled-high-quality')

# Compute text embeddings
embeddings = model.encode(["Example sentence"])

Liens & Ressources