AI ExplorerAI Explorer
OutilsCatégoriesSitesLLMsComparerQuiz IAAlternativesPremium

—

Outils IA

—

Sites & Blogs

—

LLMs & Modèles

—

Catégories

AI Explorer

Trouvez et comparez les meilleurs outils d'intelligence artificielle pour vos projets.

Fait avecen France

Explorer

  • Tous les outils
  • Sites & Blogs
  • LLMs & Modèles
  • Comparer
  • Chatbots
  • Images IA
  • Code & Dev

Entreprise

  • Premium
  • À propos
  • Contact
  • Blog

Légal

  • Mentions légales
  • Confidentialité
  • CGV

© 2026 AI Explorer. Tous droits réservés.

AccueilLLMsNV Embed v2

NV Embed v2

par nvidia

Open source · 67k downloads · 509 likes

3.4
(509 avis)EmbeddingAPI & Local
À propos

NV-Embed v2 est un modèle d'embeddings textuels polyvalent qui se distingue comme le leader du classement MTEB (Massive Text Embedding Benchmark) avec un score de 72,31 sur 56 tâches d'évaluation. Il excelle particulièrement dans les applications de recherche d'informations (RAG), où il atteint la première place avec un score de 62,65 sur 15 tâches de récupération, grâce à des techniques innovantes comme l'attention sur des vecteurs latents et un entraînement en deux étapes pour améliorer la précision. Le modèle intègre également une méthode de minage de négatifs difficiles qui affine la pertinence des résultats en éliminant les faux négatifs. Conçu pour des usages variés comme la recherche sémantique, la classification ou le clustering, NV-Embed v2 se positionne comme une solution performante et adaptable. Son approche unique, combinant architecture Mistral-7B et pooling par attention latente, lui confère un avantage significatif en termes de qualité d'embeddings par rapport aux modèles existants.

Documentation

Introduction

We present NV-Embed-v2, a generalist embedding model that ranks No. 1 on the Massive Text Embedding Benchmark (MTEB benchmark)(as of Aug 30, 2024) with a score of 72.31 across 56 text embedding tasks. It also holds the No. 1 in the retrieval sub-category (a score of 62.65 across 15 tasks) in the leaderboard, which is essential to the development of RAG technology.

NV-Embed-v2 presents several new designs, including having the LLM attend to latent vectors for better pooled embedding output, and demonstrating a two-staged instruction tuning method to enhance the accuracy of both retrieval and non-retrieval tasks. Additionally, NV-Embed-v2 incorporates a novel hard-negative mining methods that take into account the positive relevance score for better false negatives removal.

For more technical details, refer to our paper: NV-Embed: Improved Techniques for Training LLMs as Generalist Embedding Models.

Model Details

  • Base Decoder-only LLM: Mistral-7B-v0.1
  • Pooling Type: Latent-Attention
  • Embedding Dimension: 4096

How to use

Here is an example of how to encode queries and passages using Huggingface-transformer and Sentence-transformer. Please find the required package version here.

Usage (HuggingFace Transformers)

Python
import torch
import torch.nn.functional as F
from transformers import AutoTokenizer, AutoModel

# Each query needs to be accompanied by an corresponding instruction describing the task.
task_name_to_instruct = {"example": "Given a question, retrieve passages that answer the question",}

query_prefix = "Instruct: "+task_name_to_instruct["example"]+"\nQuery: "
queries = [
    'are judo throws allowed in wrestling?', 
    'how to become a radiology technician in michigan?'
    ]

# No instruction needed for retrieval passages
passage_prefix = ""
passages = [
    "Since you're reading this, you are probably someone from a judo background or someone who is just wondering how judo techniques can be applied under wrestling rules. So without further ado, let's get to the question. Are Judo throws allowed in wrestling? Yes, judo throws are allowed in freestyle and folkstyle wrestling. You only need to be careful to follow the slam rules when executing judo throws. In wrestling, a slam is lifting and returning an opponent to the mat with unnecessary force.",
    "Below are the basic steps to becoming a radiologic technologist in Michigan:Earn a high school diploma. As with most careers in health care, a high school education is the first step to finding entry-level employment. Taking classes in math and science, such as anatomy, biology, chemistry, physiology, and physics, can help prepare students for their college studies and future careers.Earn an associate degree. Entry-level radiologic positions typically require at least an Associate of Applied Science. Before enrolling in one of these degree programs, students should make sure it has been properly accredited by the Joint Review Committee on Education in Radiologic Technology (JRCERT).Get licensed or certified in the state of Michigan."
]

# load model with tokenizer
model = AutoModel.from_pretrained('nvidia/NV-Embed-v2', trust_remote_code=True)

# get the embeddings
max_length = 32768
query_embeddings = model.encode(queries, instruction=query_prefix, max_length=max_length)
passage_embeddings = model.encode(passages, instruction=passage_prefix, max_length=max_length)

# normalize embeddings
query_embeddings = F.normalize(query_embeddings, p=2, dim=1)
passage_embeddings = F.normalize(passage_embeddings, p=2, dim=1)

# get the embeddings with DataLoader (spliting the datasets into multiple mini-batches)
# batch_size=2
# query_embeddings = model._do_encode(queries, batch_size=batch_size, instruction=query_prefix, max_length=max_length, num_workers=32, return_numpy=True)
# passage_embeddings = model._do_encode(passages, batch_size=batch_size, instruction=passage_prefix, max_length=max_length, num_workers=32, return_numpy=True)

scores = (query_embeddings @ passage_embeddings.T) * 100
print(scores.tolist())
# [[87.42693328857422, 0.46283677220344543], [0.965264618396759, 86.03721618652344]]

Usage (Sentence-Transformers)

Python
import torch
from sentence_transformers import SentenceTransformer

# Each query needs to be accompanied by an corresponding instruction describing the task.
task_name_to_instruct = {"example": "Given a question, retrieve passages that answer the question",}

query_prefix = "Instruct: "+task_name_to_instruct["example"]+"\nQuery: "
queries = [
    'are judo throws allowed in wrestling?', 
    'how to become a radiology technician in michigan?'
    ]

# No instruction needed for retrieval passages
passages = [
    "Since you're reading this, you are probably someone from a judo background or someone who is just wondering how judo techniques can be applied under wrestling rules. So without further ado, let's get to the question. Are Judo throws allowed in wrestling? Yes, judo throws are allowed in freestyle and folkstyle wrestling. You only need to be careful to follow the slam rules when executing judo throws. In wrestling, a slam is lifting and returning an opponent to the mat with unnecessary force.",
    "Below are the basic steps to becoming a radiologic technologist in Michigan:Earn a high school diploma. As with most careers in health care, a high school education is the first step to finding entry-level employment. Taking classes in math and science, such as anatomy, biology, chemistry, physiology, and physics, can help prepare students for their college studies and future careers.Earn an associate degree. Entry-level radiologic positions typically require at least an Associate of Applied Science. Before enrolling in one of these degree programs, students should make sure it has been properly accredited by the Joint Review Committee on Education in Radiologic Technology (JRCERT).Get licensed or certified in the state of Michigan."
]

# load model with tokenizer
model = SentenceTransformer('nvidia/NV-Embed-v2', trust_remote_code=True)
model.max_seq_length = 32768
model.tokenizer.padding_side="right"

def add_eos(input_examples):
  input_examples = [input_example + model.tokenizer.eos_token for input_example in input_examples]
  return input_examples

# get the embeddings
batch_size = 2
query_embeddings = model.encode(add_eos(queries), batch_size=batch_size, prompt=query_prefix, normalize_embeddings=True)
passage_embeddings = model.encode(add_eos(passages), batch_size=batch_size, normalize_embeddings=True)

scores = (query_embeddings @ passage_embeddings.T) * 100
print(scores.tolist())

License

This model should not be used for any commercial purpose. Refer the license for the detailed terms.

For commercial purpose, we recommend you to use the models of NeMo Retriever Microservices (NIMs).

Correspondence to

Chankyu Lee ([email protected]), Rajarshi Roy ([email protected]), Wei Ping ([email protected])

Citation

If you find this code useful in your research, please consider citing:

Bibtex
@article{lee2024nv,
  title={NV-Embed: Improved Techniques for Training LLMs as Generalist Embedding Models},
  author={Lee, Chankyu and Roy, Rajarshi and Xu, Mengyao and Raiman, Jonathan and Shoeybi, Mohammad and Catanzaro, Bryan and Ping, Wei},
  journal={arXiv preprint arXiv:2405.17428},
  year={2024}
}
Bibtex
@article{moreira2024nv,
  title={NV-Retriever: Improving text embedding models with effective hard-negative mining},
  author={Moreira, Gabriel de Souza P and Osmulski, Radek and Xu, Mengyao and Ak, Ronay and Schifferer, Benedikt and Oldridge, Even},
  journal={arXiv preprint arXiv:2407.15831},
  year={2024}
}

Troubleshooting

1. Instruction template for MTEB benchmarks

For MTEB sub-tasks for retrieval, STS, summarization, please use the instruction prefix template in instructions.json. For classification, clustering and reranking, please use the instructions provided in Table. 7 in NV-Embed paper.

2. Required Packages

If you have trouble, try installing the python packages as below

Python
pip uninstall -y transformer-engine
pip install torch==2.2.0
pip install transformers==4.42.4
pip install flash-attn==2.2.0
pip install sentence-transformers==2.7.0

3. How to enable Multi-GPU (Note, this is the case for HuggingFace Transformers)

Python
from transformers import AutoModel
from torch.nn import DataParallel

embedding_model = AutoModel.from_pretrained("nvidia/NV-Embed-v2")
for module_key, module in embedding_model._modules.items():
    embedding_model._modules[module_key] = DataParallel(module)

4. Fixing "nvidia/NV-Embed-v2 is not the path to a directory containing a file named config.json"

Switch to your local model path,and open config.json and change the value of "_name_or_path" and replace it with your local model path.

5. Access to model nvidia/NV-Embed-v2 is restricted. You must be authenticated to access it

Use your huggingface access token to execute "huggingface-cli login".

6. How to resolve slight mismatch in Sentence transformer results.

A slight mismatch in the Sentence Transformer implementation is caused by a discrepancy in the calculation of the instruction prefix length within the Sentence Transformer package.

To fix this issue, you need to build the Sentence Transformer package from source, making the necessary modification in this line as below.

Python
git clone https://github.com/UKPLab/sentence-transformers.git
cd sentence-transformers
git checkout v2.7-release
# Modify L353 in SentenceTransformer.py to **'extra_features["prompt_length"] = tokenized_prompt["input_ids"].shape[-1]'**.
pip install -e .
Liens & Ressources
Spécifications
CatégorieEmbedding
AccèsAPI & Local
LicenceOpen Source
TarificationOpen Source
Note
3.4

Essayer NV Embed v2

Accédez directement au modèle