AI/EXPLORER
ToolsCategoriesSitesLLMsCompareAI QuizAlternativesPremium
—AI Tools
—Sites & Blogs
—LLMs & Models
—Categories
AI Explorer

Find and compare the best artificial intelligence tools for your projects.

Made within France

Explore

  • ›All tools
  • ›Sites & Blogs
  • ›LLMs & Models
  • ›Compare
  • ›Chatbots
  • ›AI Images
  • ›Code & Dev

Company

  • ›Premium
  • ›About
  • ›Contact
  • ›Blog

Legal

  • ›Legal notice
  • ›Privacy
  • ›Terms

© 2026 AI Explorer·All rights reserved.

HomeLLMssbert large nlu ru

sbert large nlu ru

by ai-forever

Open source · 102k downloads · 100 likes

2.5
(100 reviews)EmbeddingAPI & Local
About

The SBERT Large NLU RU model is a specialized version of BERT optimized for generating sentence embeddings in Russian. It converts texts into dense numerical vectors, facilitating tasks such as semantic search, classification, or comparing sentence similarity. Its key capabilities include advanced contextual understanding of Russian, making it ideal for applications requiring nuanced language analysis. The model stands out for its accuracy and efficiency, particularly through the use of average token embeddings to enhance representation quality. It is especially well-suited for natural language processing projects where subtlety and context are critical.

Documentation

BERT large model (uncased) for Sentence Embeddings in Russian language.

The model is described in this article
For better quality, use mean token embeddings.

Usage (HuggingFace Models Repository)

You can use the model directly from the model repository to compute sentence embeddings:

Python
from transformers import AutoTokenizer, AutoModel
import torch


#Mean Pooling - Take attention mask into account for correct averaging
def mean_pooling(model_output, attention_mask):
    token_embeddings = model_output[0] #First element of model_output contains all token embeddings
    input_mask_expanded = attention_mask.unsqueeze(-1).expand(token_embeddings.size()).float()
    sum_embeddings = torch.sum(token_embeddings * input_mask_expanded, 1)
    sum_mask = torch.clamp(input_mask_expanded.sum(1), min=1e-9)
    return sum_embeddings / sum_mask



#Sentences we want sentence embeddings for
sentences = ['Привет! Как твои дела?',
             'А правда, что 42 твое любимое число?']

#Load AutoModel from huggingface model repository
tokenizer = AutoTokenizer.from_pretrained("ai-forever/sbert_large_nlu_ru")
model = AutoModel.from_pretrained("ai-forever/sbert_large_nlu_ru")

#Tokenize sentences
encoded_input = tokenizer(sentences, padding=True, truncation=True, max_length=24, return_tensors='pt')

#Compute token embeddings
with torch.no_grad():
    model_output = model(**encoded_input)

#Perform pooling. In this case, mean pooling
sentence_embeddings = mean_pooling(model_output, encoded_input['attention_mask'])

Authors

  • SberDevices Team.
  • Aleksandr Abramov: HF profile, Github, Kaggle Competitions Master;
  • Denis Antykhov: Github;
  • Ibragim Badertdinov: Github
Capabilities & Tags
transformerspytorchsafetensorsbertfeature-extractionPyTorchTransformersrutext-embeddings-inferenceendpoints_compatible
Links & Resources
Specifications
CategoryEmbedding
AccessAPI & Local
LicenseOpen Source
PricingOpen Source
Rating
2.5

Try sbert large nlu ru

Access the model directly