by Xenova
Open source · 19k downloads · 8 likes
The *Jina Embeddings v2 base en* model is an artificial intelligence tool designed to convert text into numerical vectors, thereby facilitating semantic understanding and comparison between sentences or documents. Its core capabilities include generating high-quality embeddings for English texts, enabling applications such as semantic search, classification, or similarity detection. It stands out for its efficiency and compatibility with modern frameworks like Transformers.js, particularly through optimized ONNX weights for web deployment. This model is especially useful for developers looking to integrate advanced natural language processing features into interactive applications or systems requiring fast and accurate text analysis.
https://huggingface.co/jinaai/jina-embeddings-v2-base-en with ONNX weights to be compatible with Transformers.js.
If you haven't already, you can install the Transformers.js JavaScript library from NPM using:
npm i @huggingface/transformers
import { pipeline, cos_sim } from '@huggingface/transformers';
// Create feature extraction pipeline
const extractor = await pipeline('feature-extraction', 'Xenova/jina-embeddings-v2-base-en', {
dtype: "fp32" // Options: "fp32", "fp16", "q8", "q4"
});
// Generate embeddings
const output = await extractor(
['How is the weather today?', 'What is the current weather like today?'],
{ pooling: 'mean' }
);
// Compute cosine similarity
console.log(cos_sim(output[0].data, output[1].data)); // 0.9341313949712492 (unquantized) vs. 0.9022937687830741 (quantized)
Note: Having a separate repo for ONNX weights is intended to be a temporary solution until WebML gains more traction. If you would like to make your models web-ready, we recommend converting to ONNX using 🤗 Optimum and structuring your repo like this one (with ONNX weights located in a subfolder named onnx).