by OrdalieTech
Open source · 60k downloads · 52 likes
Solon embeddings large 0.1 is a cutting-edge open-source model specialized in generating embeddings for French. It excels in text similarity tasks, such as classification, reranking, and semantic similarity assessment, supported by rigorous benchmarks across nine diverse French evaluations. Its optimized approach, particularly the use of the "query: " prefix for queries, significantly enhances performance in information retrieval and contextual understanding. Designed to meet the needs of applications requiring nuanced French language analysis, it stands out for its precision and adaptability to various use cases, including sentiment analysis or intent detection. This model positions itself as a robust and accessible solution for professionals and researchers working with French.
SOTA Open source french embedding model.
Instructions :
Add "query : " before the query to retrieve to increase performance of retrieval.
No instructions needed for passages.
| Model | Mean Score |
|---|---|
| OrdalieTech/Solon-embeddings-large-0.1 | 0.7490 |
| cohere/embed-multilingual-v3 | 0.7402 |
| OrdalieTech/Solon-embeddings-base-0.1 | 0.7306 |
| openai/ada-002 | 0.7290 |
| cohere/embed-multilingual-light-v3 | 0.6945 |
| antoinelouis/biencoder-camembert-base-mmarcoFR | 0.6826 |
| dangvantuan/sentence-camembert-large | 0.6756 |
| voyage/voyage-01 | 0.6753 |
| intfloat/multilingual-e5-large | 0.6660 |
| intfloat/multilingual-e5-base | 0.6597 |
| Sbert/paraphrase-multilingual-mpnet-base-v2 | 0.5975 |
| dangvantuan/sentence-camembert-base | 0.5456 |
| EuropeanParliament/eubert_embedding_v1 | 0.5063 |
These results have been obtained through 9 french benchmarks on a variety of text similarity tasks (classification, reranking, STS) :
We created OrdalieFRSTS and OrdalieFRReranking to enhance the benchmarking capabilities of French STS and reranking assessments.
(evaluation script available here : github.com/OrdalieTech/mteb)