e5 base sts en de

Name: e5 base sts en de
Rating: 1.6 (17 reviews)

by danielheinz

Open source · 160k downloads · 17 likes

1.6

(17 reviews)EmbeddingAPI & Local

About

This model, named "e5 base sts en de," is a specialized version of the multilingual *multilingual-e5-base* model, optimized for evaluating semantic similarity between English and German texts. It has been fine-tuned on German paraphrase and textual similarity corpora, enabling it to finely grasp and compare the nuances between the two languages. Its primary capabilities lie in analyzing semantic proximity, making it useful for tasks such as multilingual information retrieval, paraphrase detection, or evaluating textual coherence. It stands out for its high accuracy, as evidenced by its scores exceeding 0.9 on benchmark datasets, and its adaptability to bilingual contexts. This model is particularly well-suited for applications requiring a nuanced understanding of relationships between texts in these two languages.

Documentation

INFO: The model is being continuously updated.

The model is a multilingual-e5-base model fine-tuned with the task of semantic textual similarity in mind.

Model Training

The model has been fine-tuned on the German subsets of the following datasets:

The training procedure can be divided into two stages:

training on paraphrase datasets with the Multiple Negatives Ranking Loss
training on semantic textual similarity datasets using the Cosine Similarity Loss

Results

The model achieves the following results:

0.920 on stsb's validation subset
0.904 on stsb's test subset

Capabilities & Tags

transformerssafetensorsxlm-robertafeature-extractiondemodel-indextext-embeddings-inferenceendpoints_compatible

Links & Resources