Jina Code Embeddings: A Small but Performant Code Embedding Model

Intended Usage & Model Info

jina-code-embeddings is an embedding model for code retrieval. The model supports various types of code retrieval (text-to-code, code-to-code, code-to-text, code-to-completion) and technical question answering across 15+ programming languages.

Built on Qwen/Qwen2.5-Coder-0.5B, jina-code-embeddings-0.5b features:

Multilingual support (15+ programming languages) and compatibility with a wide range of domains, including web development, software development, machine learning, data science, and educational coding problems.
Task-specific instruction prefixes for NL2Code, Code2Code, Code2NL, Code2Completion, and Technical QA, which can be selected at inference time.
Flexible embedding size: dense embeddings are 896-dimensional by default but can be truncated to as low as 64 with minimal performance loss.

Summary of features:

Feature	Jina Code Embeddings 0.5B
Base Model	Qwen2.5-Coder-0.5B
Supported Tasks	`nl2code`, `code2code`, `code2nl`, `code2completion`, `qa`
Model DType	BFloat 16
Max Sequence Length	32768
Embedding Vector Dimension	896
Matryoshka dimensions	64, 128, 256, 512, 896
Pooling Strategy	Last-token pooling
Attention Mechanism	FlashAttention2

Usage

Requirements

The following Python packages are required:

transformers>=4.53.0
torch>=2.7.1

Optional / Recommended

flash-attention: Installing flash-attention is recommended for improved inference speed and efficiency, but not mandatory.
sentence-transformers: If you want to use the model via the sentence-transformers interface, install this package as well.

via transformers

Python

# !pip install transformers>=4.53.0 torch>=2.7.1

import torch
import torch.nn.functional as F

from transformers import AutoModel, AutoTokenizer

INSTRUCTION_CONFIG = {
    "nl2code": {
        "query": "Find the most relevant code snippet given the following query:\n",
        "passage": "Candidate code snippet:\n"
    },
    "qa": {
        "query": "Find the most relevant answer given the following question:\n",
        "passage": "Candidate answer:\n"
    },
    "code2code": {
        "query": "Find an equivalent code snippet given the following code snippet:\n",
        "passage": "Candidate code snippet:\n"
    },
    "code2nl": {
        "query": "Find the most relevant comment given the following code snippet:\n",
        "passage": "Candidate comment:\n"
    },
    "code2completion": {
        "query": "Find the most relevant completion given the following start of code snippet:\n",
        "passage": "Candidate completion:\n"
    }
}

MAX_LENGTH = 8192

def cosine_similarity(x,y):
    x = F.normalize(x, p=2, dim=1)
    y = F.normalize(y, p=2, dim=1)
    return x @ y.T

def last_token_pool(last_hidden_states, attention_mask):
    left_padding = (attention_mask[:, -1].sum() == attention_mask.shape[0])
    if left_padding:
        return last_hidden_states[:, -1]
    else:
        sequence_lengths = attention_mask.sum(dim=1) - 1
        batch_size = last_hidden_states.shape[0]
        return last_hidden_states[torch.arange(batch_size, device=last_hidden_states.device), sequence_lengths]

def add_instruction(instruction, query):
    return f'{instruction}{query}'

# The queries and documents to embed
queries = [
    add_instruction(INSTRUCTION_CONFIG["nl2code"]["query"], "print hello world in python"),
    add_instruction(INSTRUCTION_CONFIG["nl2code"]["query"], "initialize array of 5 zeros in c++")
]
documents = [
    add_instruction(INSTRUCTION_CONFIG["nl2code"]["passage"], "print('Hello World!')"),
    add_instruction(INSTRUCTION_CONFIG["nl2code"]["passage"], "int arr[5] = {0, 0, 0, 0, 0};")
]
all_inputs = queries + documents

tokenizer = AutoTokenizer.from_pretrained('jinaai/jina-code-embeddings-0.5b')
model = AutoModel.from_pretrained('jinaai/jina-code-embeddings-0.5b')

batch_dict = tokenizer(
    all_inputs,
    padding=True,
    truncation=True,
    max_length=MAX_LENGTH,
    return_tensors="pt",
)
batch_dict.to(model.device)
outputs = model(**batch_dict)
embeddings = last_token_pool(outputs.last_hidden_state, batch_dict['attention_mask'])
query_embeddings = embeddings[:2]
passage_embeddings = embeddings[2:]

# Compute the (cosine) similarity between the query and document embeddings
scores = cosine_similarity(query_embeddings, passage_embeddings)
print(scores)
# tensor([[0.8168, 0.1236],
#         [0.1204, 0.5525]], grad_fn=<MmBackward0>)

via sentence-transformers

Python

# !pip install sentence_transformers>=5.0.0 torch>=2.7.1

import torch
from sentence_transformers import SentenceTransformer

# Load the model
model = SentenceTransformer(
    "jinaai/jina-code-embeddings-0.5b",
    model_kwargs={
        "torch_dtype": torch.bfloat16,
        "attn_implementation": "flash_attention_2",
        "device_map": "cuda"
    },
    tokenizer_kwargs={"padding_side": "left"},
)

# The queries and documents to embed
queries = [
    "print hello world in python",
    "initialize array of 5 zeros in c++"
]
documents = [
    "print('Hello World!')",
    "int arr[5] = {0, 0, 0, 0, 0};"
]

query_embeddings = model.encode(queries, prompt_name="nl2code_query")
document_embeddings = model.encode(documents, prompt_name="nl2code_document")

# Compute the (cosine) similarity between the query and document embeddings
similarity = model.similarity(query_embeddings, document_embeddings)
print(similarity)
# tensor([[0.8169, 0.1214],
#         [0.1190, 0.5500]])

via vLLM

Python


import torch
import torch.nn.functional as F
from vllm import LLM

INSTRUCTION_CONFIG = {
    "nl2code": {
        "query": "Find the most relevant code snippet given the following query:\n",
        "passage": "Candidate code snippet:\n"
    },
    "qa": {
        "query": "Find the most relevant answer given the following question:\n",
        "passage": "Candidate answer:\n"
    },
    "code2code": {
        "query": "Find an equivalent code snippet given the following code snippet:\n",
        "passage": "Candidate code snippet:\n"
    },
    "code2nl": {
        "query": "Find the most relevant comment given the following code snippet:\n",
        "passage": "Candidate comment:\n"
    },
    "code2completion": {
        "query": "Find the most relevant completion given the following start of code snippet:\n",
        "passage": "Candidate completion:\n"
    }
}

def add_instruction(instruction, text):
    return f"{instruction}{text}"

def cosine_similarity(x, y):
    x = F.normalize(x, p=2, dim=1)
    y = F.normalize(y, p=2, dim=1)
    return x @ y.T

# Build the queries and documents
queries = [
    add_instruction(INSTRUCTION_CONFIG["nl2code"]["query"], "print hello world in python"),
    add_instruction(INSTRUCTION_CONFIG["nl2code"]["query"], "initialize array of 5 zeros in c++"),
]
documents = [
    add_instruction(INSTRUCTION_CONFIG["nl2code"]["passage"], "print('Hello World!')"),
    add_instruction(INSTRUCTION_CONFIG["nl2code"]["passage"], "int arr[5] = {0, 0, 0, 0, 0};"),
]
all_inputs = queries + documents

# vLLM embedding model
llm = LLM(
    model="jinaai/jina-code-embeddings-0.5b",
    task="embed"
)

# Encode with vLLM
outputs = llm.encode(all_inputs)

# Collect embeddings into a single tensor
emb_list = []
for out in outputs:
    vec = out.outputs.data.detach()
    emb_list.append(vec)
embeddings = torch.stack(emb_list, dim=0)

# Split into query and passage embeddings
n_q = len(queries)
query_embeddings = embeddings[:n_q]
passage_embeddings = embeddings[n_q:]

# Cosine similarity matrix (queries x documents)
scores = cosine_similarity(query_embeddings, passage_embeddings)
print(scores)
# tensor([[0.8171, 0.1230],
#         [0.1207, 0.5513]])

Citation

Please refer to our technical report of jina-code-embeddings for training details and benchmarks. If you find it useful in your research, please cite the following paper:

INI

@misc{kryvosheieva2025efficientcodeembeddingscode,
      title={Efficient Code Embeddings from Code Generation Models}, 
      author={Daria Kryvosheieva and Saba Sturua and Michael Günther and Scott Martens and Han Xiao},
      year={2025},
      eprint={2508.21290},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2508.21290}, 
}

Contact

Join our Discord community and chat with other community members about ideas.

Jina Code Embeddings: A Small but Performant Code Embedding Model

Intended Usage & Model Info

Built on Qwen/Qwen2.5-Coder-0.5B, jina-code-embeddings-0.5b features:

Multilingual support (15+ programming languages) and compatibility with a wide range of domains, including web development, software development, machine learning, data science, and educational coding problems.
Task-specific instruction prefixes for NL2Code, Code2Code, Code2NL, Code2Completion, and Technical QA, which can be selected at inference time.
Flexible embedding size: dense embeddings are 896-dimensional by default but can be truncated to as low as 64 with minimal performance loss.

Summary of features:

Feature	Jina Code Embeddings 0.5B
Base Model	Qwen2.5-Coder-0.5B
Supported Tasks	`nl2code`, `code2code`, `code2nl`, `code2completion`, `qa`
Model DType	BFloat 16
Max Sequence Length	32768
Embedding Vector Dimension	896
Matryoshka dimensions	64, 128, 256, 512, 896
Pooling Strategy	Last-token pooling
Attention Mechanism	FlashAttention2

Usage

Requirements

The following Python packages are required:

transformers>=4.53.0
torch>=2.7.1

Optional / Recommended

flash-attention: Installing flash-attention is recommended for improved inference speed and efficiency, but not mandatory.
sentence-transformers: If you want to use the model via the sentence-transformers interface, install this package as well.

via transformers

Python

# !pip install transformers>=4.53.0 torch>=2.7.1

import torch
import torch.nn.functional as F

from transformers import AutoModel, AutoTokenizer

INSTRUCTION_CONFIG = {
    "nl2code": {
        "query": "Find the most relevant code snippet given the following query:\n",
        "passage": "Candidate code snippet:\n"
    },
    "qa": {
        "query": "Find the most relevant answer given the following question:\n",
        "passage": "Candidate answer:\n"
    },
    "code2code": {
        "query": "Find an equivalent code snippet given the following code snippet:\n",
        "passage": "Candidate code snippet:\n"
    },
    "code2nl": {
        "query": "Find the most relevant comment given the following code snippet:\n",
        "passage": "Candidate comment:\n"
    },
    "code2completion": {
        "query": "Find the most relevant completion given the following start of code snippet:\n",
        "passage": "Candidate completion:\n"
    }
}

MAX_LENGTH = 8192

def cosine_similarity(x,y):
    x = F.normalize(x, p=2, dim=1)
    y = F.normalize(y, p=2, dim=1)
    return x @ y.T

def last_token_pool(last_hidden_states, attention_mask):
    left_padding = (attention_mask[:, -1].sum() == attention_mask.shape[0])
    if left_padding:
        return last_hidden_states[:, -1]
    else:
        sequence_lengths = attention_mask.sum(dim=1) - 1
        batch_size = last_hidden_states.shape[0]
        return last_hidden_states[torch.arange(batch_size, device=last_hidden_states.device), sequence_lengths]

def add_instruction(instruction, query):
    return f'{instruction}{query}'

# The queries and documents to embed
queries = [
    add_instruction(INSTRUCTION_CONFIG["nl2code"]["query"], "print hello world in python"),
    add_instruction(INSTRUCTION_CONFIG["nl2code"]["query"], "initialize array of 5 zeros in c++")
]
documents = [
    add_instruction(INSTRUCTION_CONFIG["nl2code"]["passage"], "print('Hello World!')"),
    add_instruction(INSTRUCTION_CONFIG["nl2code"]["passage"], "int arr[5] = {0, 0, 0, 0, 0};")
]
all_inputs = queries + documents

tokenizer = AutoTokenizer.from_pretrained('jinaai/jina-code-embeddings-0.5b')
model = AutoModel.from_pretrained('jinaai/jina-code-embeddings-0.5b')

batch_dict = tokenizer(
    all_inputs,
    padding=True,
    truncation=True,
    max_length=MAX_LENGTH,
    return_tensors="pt",
)
batch_dict.to(model.device)
outputs = model(**batch_dict)
embeddings = last_token_pool(outputs.last_hidden_state, batch_dict['attention_mask'])
query_embeddings = embeddings[:2]
passage_embeddings = embeddings[2:]

# Compute the (cosine) similarity between the query and document embeddings
scores = cosine_similarity(query_embeddings, passage_embeddings)
print(scores)
# tensor([[0.8168, 0.1236],
#         [0.1204, 0.5525]], grad_fn=<MmBackward0>)

via sentence-transformers

Python

# !pip install sentence_transformers>=5.0.0 torch>=2.7.1

import torch
from sentence_transformers import SentenceTransformer

# Load the model
model = SentenceTransformer(
    "jinaai/jina-code-embeddings-0.5b",
    model_kwargs={
        "torch_dtype": torch.bfloat16,
        "attn_implementation": "flash_attention_2",
        "device_map": "cuda"
    },
    tokenizer_kwargs={"padding_side": "left"},
)

# The queries and documents to embed
queries = [
    "print hello world in python",
    "initialize array of 5 zeros in c++"
]
documents = [
    "print('Hello World!')",
    "int arr[5] = {0, 0, 0, 0, 0};"
]

query_embeddings = model.encode(queries, prompt_name="nl2code_query")
document_embeddings = model.encode(documents, prompt_name="nl2code_document")

# Compute the (cosine) similarity between the query and document embeddings
similarity = model.similarity(query_embeddings, document_embeddings)
print(similarity)
# tensor([[0.8169, 0.1214],
#         [0.1190, 0.5500]])

via vLLM

Python


import torch
import torch.nn.functional as F
from vllm import LLM

INSTRUCTION_CONFIG = {
    "nl2code": {
        "query": "Find the most relevant code snippet given the following query:\n",
        "passage": "Candidate code snippet:\n"
    },
    "qa": {
        "query": "Find the most relevant answer given the following question:\n",
        "passage": "Candidate answer:\n"
    },
    "code2code": {
        "query": "Find an equivalent code snippet given the following code snippet:\n",
        "passage": "Candidate code snippet:\n"
    },
    "code2nl": {
        "query": "Find the most relevant comment given the following code snippet:\n",
        "passage": "Candidate comment:\n"
    },
    "code2completion": {
        "query": "Find the most relevant completion given the following start of code snippet:\n",
        "passage": "Candidate completion:\n"
    }
}

def add_instruction(instruction, text):
    return f"{instruction}{text}"

def cosine_similarity(x, y):
    x = F.normalize(x, p=2, dim=1)
    y = F.normalize(y, p=2, dim=1)
    return x @ y.T

# Build the queries and documents
queries = [
    add_instruction(INSTRUCTION_CONFIG["nl2code"]["query"], "print hello world in python"),
    add_instruction(INSTRUCTION_CONFIG["nl2code"]["query"], "initialize array of 5 zeros in c++"),
]
documents = [
    add_instruction(INSTRUCTION_CONFIG["nl2code"]["passage"], "print('Hello World!')"),
    add_instruction(INSTRUCTION_CONFIG["nl2code"]["passage"], "int arr[5] = {0, 0, 0, 0, 0};"),
]
all_inputs = queries + documents

# vLLM embedding model
llm = LLM(
    model="jinaai/jina-code-embeddings-0.5b",
    task="embed"
)

# Encode with vLLM
outputs = llm.encode(all_inputs)

# Collect embeddings into a single tensor
emb_list = []
for out in outputs:
    vec = out.outputs.data.detach()
    emb_list.append(vec)
embeddings = torch.stack(emb_list, dim=0)

# Split into query and passage embeddings
n_q = len(queries)
query_embeddings = embeddings[:n_q]
passage_embeddings = embeddings[n_q:]

# Cosine similarity matrix (queries x documents)
scores = cosine_similarity(query_embeddings, passage_embeddings)
print(scores)
# tensor([[0.8171, 0.1230],
#         [0.1207, 0.5513]])

Citation

Please refer to our technical report of jina-code-embeddings for training details and benchmarks. If you find it useful in your research, please cite the following paper:

INI

@misc{kryvosheieva2025efficientcodeembeddingscode,
      title={Efficient Code Embeddings from Code Generation Models}, 
      author={Daria Kryvosheieva and Saba Sturua and Michael Günther and Scott Martens and Han Xiao},
      year={2025},
      eprint={2508.21290},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2508.21290}, 
}

Contact

Join our Discord community and chat with other community members about ideas.

jina code embeddings 0.5b

Jina Code Embeddings: A Small but Performant Code Embedding Model

Intended Usage & Model Info

Usage

Optional / Recommended

Citation

Contact

jina code embeddings 0.5b

Jina Code Embeddings: A Small but Performant Code Embedding Model

Intended Usage & Model Info

Usage

Optional / Recommended

Citation

Contact