AI/EXPLORER
ToolsCategoriesSitesLLMsCompareAI QuizAlternativesPremium
—AI Tools
—Sites & Blogs
—LLMs & Models
—Categories
AI Explorer

Find and compare the best artificial intelligence tools for your projects.

Made within France

Explore

  • ›All tools
  • ›Sites & Blogs
  • ›LLMs & Models
  • ›Compare
  • ›Chatbots
  • ›AI Images
  • ›Code & Dev

Company

  • ›Premium
  • ›About
  • ›Contact
  • ›Blog

Legal

  • ›Legal notice
  • ›Privacy
  • ›Terms

© 2026 AI Explorer·All rights reserved.

HomeLLMsvoyage 4 nano

voyage 4 nano

by voyageai

Open source · 75k downloads · 107 likes

2.5
(107 reviews)EmbeddingAPI & Local
About

Voyage 4 nano is a cutting-edge text embedding model developed by Voyage AI, optimized for high-performance semantic search and information retrieval tasks. Designed to be multilingual with a context length of 32,000 tokens, it stands out for generating high-quality vector representations while remaining lightweight and cost-efficient. Compatible with other models in the Voyage 4 series, it enables seamless embedding interchangeability without requiring reindexing, simplifying transitions between different use cases. Its advanced features, such as Matryoshka representation learning and quantization-aware training, deliver remarkable flexibility in embedding dimensions (2048, 1024, 512, or 256) and numerical precision (32-bit float, 8-bit integer, or binary). Ideal for local applications, prototyping, or large-scale deployments that demand a balance between performance and cost, it caters to both developers and enterprises seeking an efficient solution for search or recommendation systems.

Documentation

MongoDB

voyage-4-nano

Model Overview

voyage-4-nano is a state-of-the-art text embedding model from the Voyage 4 series, designed for high-performance semantic search and retrieval tasks. This model features:

  • Developed by: Voyage AI
  • Supported Language(s): Multilingual
  • Context Length: 32000
  • Parameters: 180M [Non-embedding] + 160M [Embedding]
  • License: Apache 2.0

For detailed performance metrics and benchmarks, please refer to:

  • 📝 Voyage-4 Release Blog Post
  • 📊 Evaluation Spreadsheet

Key Features

Shared Embedding Space with voyage-4 series

The shared embedding space introduced in the Voyage 4 model series eliminates the need to re-index your data when switching between models in the series. Embeddings generated by different Voyage 4 models (voyage-4-large, voyage-4, voyage-4-lite, and voyage-4-nano) can be directly compared and used interchangeably. For example, use voyage-4-large for high-fidelity indexing, voyage-4-lite for high-throughput queries, and voyage-4-nano for local development.

Frontier Retrieval Quality at Low Cost

Outperforms much larger existing embedding models, including voyage-3.5-lite.

Matryoshka Representation Learning (MRL)

voyage-4-nano is trained with Matryoshka Representation Learning to enable flexible embedding dimensions with minimal loss of retreival quality. It supports 2048, 1024, 512, and 256 dimensional embeddings.

Quantization-Aware Training

voyage-4-nano uses quantization-aware training to enable flexible output data types with minimal loss of retreival quality. It supports 32-bit floating point, signed and unsigned 8-bit integer, and binary precision outputs.

Usage

Via Transformers

Python
import torch
from transformers import AutoModel, AutoTokenizer


def mean_pool(
    last_hidden_states: torch.Tensor, attention_mask: torch.Tensor
) -> torch.Tensor:
    input_mask_expanded = (
        attention_mask.unsqueeze(-1).expand(last_hidden_states.size()).float()
    )
    sum_embeddings = torch.sum(last_hidden_states * input_mask_expanded, 1)
    sum_mask = input_mask_expanded.sum(1)
    sum_mask = torch.clamp(sum_mask, min=1e-9)
    output_vectors = sum_embeddings / sum_mask
    return output_vectors


# If you have an Nvidia GPU, it's recommended to use exactly the same arguments for Nvidia GPUs. attn_implementation="eager" or "sdpa" also works, but some minor differences in embeddings are expected

device = "cuda"
model = AutoModel.from_pretrained(
    "voyageai/voyage-4-nano",
    trust_remote_code=True,
    attn_implementation="flash_attention_2",
    dtype=torch.bfloat16,
).to(device)
tokenizer = AutoTokenizer.from_pretrained("voyageai/voyage-4-nano")

# Embed queries with prompts
query = "What is the fastest route to 88 Kearny?"
prompt = "Represent the query for retrieving supporting documents: "
inputs = tokenizer(
    prompt + query, return_tensors="pt", padding=True, truncation=True, max_length=32768
)
inputs = {k: v.to(device) for k, v in inputs.items()}
with torch.no_grad():
    outputs = model.forward(**inputs)
embeddings = mean_pool(outputs.last_hidden_state, inputs["attention_mask"])
embeddings = torch.nn.functional.normalize(embeddings, p=2, dim=1)

Via Sentence Transformers

Python
from sentence_transformers import SentenceTransformer
import torch

# Standard loading, assuming no GPU access
model = SentenceTransformer(
    "voyageai/voyage-4-nano", 
    trust_remote_code=True, 
    truncate_dim=2048
)

# OPTIONAL: Loading for high-performance inference with GPUs
# Use 'flash_attention_2' and 'bfloat16' if your GPU supports it (e.g., A100, H100, RTX 30/40 series)
# model = SentenceTransformer(
#     "voyageai/voyage-4-nano", 
#     trust_remote_code=True, 
#     truncate_dim=2048, 
#     model_kwargs={
#         "attn_implementation": "flash_attention_2",
#         "dtype": torch.bfloat16
#     }
# )

query = "Which planet is known as the Red Planet?"
documents = [
	"Venus is often called Earth's twin because of its similar size and proximity.",
	"Mars, known for its reddish appearance, is often referred to as the Red Planet.",
	"Jupiter, the largest planet in our solar system, has a prominent red spot.",
	"Saturn, famous for its rings, is sometimes mistaken for the Red Planet."
]

# Encode via encode_query and encode_document to automatically use the right prompts
query_embedding = model.encode_query(query)
document_embeddings = model.encode_document(documents)

# Inspect the output shapes
print(f"Query Shape: {query_embedding.shape}")      # Expected: (2048,)
print(f"Document Shape: {document_embeddings.shape}") # Expected: (4, 2048)
  • The encode_query and encode_document methods automatically prepend the "Represent the query for retrieving supporting documents: " and "Represent the document for retrieval: " prompts as defined in config_sentence_transformers.json, respectively.
  • The default embedding dimension is 2048. To obtain lower-dimensional embeddings, you can use the truncate_dim argument in the encode_query and encode_document methods, or when initializing the model via the truncate_dim parameter. For example, model.encode_query(query, truncate_dim=512) will yield 512-dimensional embeddings. The model supports 2048, 1024, 512, and 256-dimensional embeddings.
  • You can post-process the embeddings to lower quantization levels using the precision argument in the encode_query and encode_document methods. For example, model.encode_query(query, precision='int8') will yield signed 8-bit integer embeddings. The supported precisions are 'float32', 'int8', 'uint8', 'binary', and 'ubinary'.

Via vllm

INI
"""
Example: Run voyage-4-nano on vLLM and compare output embeddings with HuggingFace.

Requires: pip install vllm==0.16.0 sentence-transformers
"""

import torch
import torch.nn.functional as F
import json

from vllm import LLM
from vllm.config import PoolerConfig

query = "Which planet is known as the Red Planet?"
documents = [
    "Venus is often called Earth's twin because of its similar size and proximity.",
    "Mars, known for its reddish appearance, is often referred to as the Red Planet.",
    "Jupiter, the largest planet in our solar system, has a prominent red spot.",
    "Saturn, famous for its rings, is sometimes mistaken for the Red Planet."
]


def get_hf_result(input):
    """Get embeddings from the HuggingFace SentenceTransformer pipeline as a reference."""
    from sentence_transformers import SentenceTransformer

    model = SentenceTransformer(
        "voyageai/voyage-4-nano", trust_remote_code=True, truncate_dim=2048
    )
    if isinstance(input, str):
        return model.encode_query(input).tolist()

    if isinstance(input, list):
        return model.encode_document(input).tolist()


def compare_embeddings(a, b):
    """Compare two embedding vectors and print detailed metrics."""
    a = torch.tensor(a, dtype=torch.float32)
    b = torch.tensor(b, dtype=torch.float32)

    norm_a = a.norm(p=2).item()
    norm_b = b.norm(p=2).item()

    an = F.normalize(a, p=2, dim=0)
    bn = F.normalize(b, p=2, dim=0)

    cosine = torch.dot(an, bn).item()
    l2 = (a - b).norm(p=2).item()
    l2_normed = (an - bn).norm(p=2).item()
    mae = (a - b).abs().mean().item()
    max_abs = (a - b).abs().max().item()

    ret = {
        "dim_a": a.numel(),
        "dim_b": b.numel(),
        "norm_a": norm_a,
        "norm_b": norm_b,
        "cosine_similarity": cosine,
        "l2_distance_raw": l2,
        "l2_distance_normalized": l2_normed,
        "mae": mae,
        "max_abs_diff": max_abs,
    }
    print("Compare the embeddings:", json.dumps(ret, indent=2))
    return ret


def example():
    # voyage-4-nano uses task-specific prompts for queries vs documents
    query_prompt = "Represent the query for retrieving supporting documents: "
    doc_prompt = "Represent the document for retrieval: "

    llm = LLM(
        model="voyageai/voyage-4-nano",
        runner="pooling",
        convert="embed",
        hf_overrides={
            # Use the bidirectional embedding architecture for voyage models
            "architectures": ["VoyageQwen3BidirectionalEmbedModel"],
        },
        trust_remote_code=True,
        dtype="bfloat16",
        max_model_len=32768,
        gpu_memory_utilization=0.5,
        enforce_eager=True,
        pooler_config=PoolerConfig(
            pooling_type="MEAN",
        ),
        enable_mfu_metrics=False,
        disable_log_stats=False,
    )

    # --- Query embedding ---
    query_emb = llm.embed([query_prompt + query])[0].outputs.embedding
    compare_embeddings(query_emb, get_hf_result(query))

    # --- Document embeddings (batched) ---
    embs = llm.embed([doc_prompt + doc for doc in documents])
    doc_embs = [e.outputs.embedding for e in embs]

    doc_embs_hf = get_hf_result(documents)
    for i in range(len(doc_embs)):
        compare_embeddings(doc_embs[i], doc_embs_hf[i])


if __name__ == "__main__":
    example()

Acknowledgments

This model builds upon foundational work by the Qwen Team at Alibaba. We are grateful for their contributions to the open-source community, which have informed the development of this specialized embedding model for the Voyage 4 series.

We'd like to thank Tom Aarsen for adding sentence transformers suppport and improving transformers integration.

Capabilities & Tags
sentence-transformerssafetensorsqwen3text-generationtransformersfeature-extractioncustom_codemultilingualtext-embeddings-inferenceendpoints_compatible
Links & Resources
Specifications
CategoryEmbedding
AccessAPI & Local
LicenseOpen Source
PricingOpen Source
Rating
2.5

Try voyage 4 nano

Access the model directly