AI ExplorerAI Explorer
ToolsCategoriesSitesLLMsCompareAI QuizAlternativesPremium

—

AI Tools

—

Sites & Blogs

—

LLMs & Models

—

Categories

AI Explorer

Find and compare the best artificial intelligence tools for your projects.

Made within France

Explore

  • All tools
  • Sites & Blogs
  • LLMs & Models
  • Compare
  • Chatbots
  • AI Images
  • Code & Dev

Company

  • Premium
  • About
  • Contact
  • Blog

Legal

  • Legal notice
  • Privacy
  • Terms

© 2026 AI Explorer. All rights reserved.

HomeLLMsKoSimCSE roberta multitask

KoSimCSE roberta multitask

by BM-K

Open source · 21k downloads · 28 likes

1.8
(28 reviews)EmbeddingAPI & Local
About

KoSimCSE roberta multitask is a sentence embedding model specifically designed for Korean, optimized to capture semantic similarity between texts. It leverages a multitask architecture that enhances the quality of vector representations by combining multiple learning objectives, such as semantic similarity and classification. This model excels in tasks requiring a nuanced understanding of Korean, including information retrieval, document clustering, and paraphrase detection. Its use cases span sentiment analysis, content recommendation, and improving automated dialogue systems. What sets it apart is its ability to generate robust and precise embeddings, even for complex or ambiguous sentences, thanks to its training on diverse data and advanced fine-tuning techniques.

Documentation

https://github.com/BM-K/Sentence-Embedding-is-all-you-need

Korean-Sentence-Embedding

🍭 Korean sentence embedding repository. You can download the pre-trained models and inference right away, also it provides environments where individuals can train models.

Quick tour

Python
import torch
from transformers import AutoModel, AutoTokenizer

def cal_score(a, b):
    if len(a.shape) == 1: a = a.unsqueeze(0)
    if len(b.shape) == 1: b = b.unsqueeze(0)

    a_norm = a / a.norm(dim=1)[:, None]
    b_norm = b / b.norm(dim=1)[:, None]
    return torch.mm(a_norm, b_norm.transpose(0, 1)) * 100

model = AutoModel.from_pretrained('BM-K/KoSimCSE-roberta-multitask') 
AutoTokenizer.from_pretrained('BM-K/KoSimCSE-roberta-multitask')

sentences = ['치타가 들판을 가로 질러 먹이를 쫓는다.',
             '치타 한 마리가 먹이 뒤에서 달리고 있다.',
             '원숭이 한 마리가 드럼을 연주한다.']

inputs = tokenizer(sentences, padding=True, truncation=True, return_tensors="pt")
embeddings, _ = model(**inputs, return_dict=False)

score01 = cal_score(embeddings[0][0], embeddings[1][0])
score02 = cal_score(embeddings[0][0], embeddings[2][0])

Performance

  • Semantic Textual Similarity test set results
ModelAVGCosine PearsonCosine SpearmanEuclidean PearsonEuclidean SpearmanManhattan PearsonManhattan SpearmanDot PearsonDot Spearman
KoSBERT†SKT77.4078.8178.4777.6877.7877.7177.8375.7575.22
KoSBERT80.3982.1382.2580.6780.7580.6980.7877.9677.90
KoSRoBERTa81.6481.2082.2081.7982.3481.5982.2080.6281.25
KoSentenceBART77.1479.7178.7478.4278.0278.4078.0074.2472.15
KoSentenceT577.8380.8779.7480.2479.3680.1979.2772.8170.17
KoSimCSE-BERT†SKT81.3282.1282.5681.8481.6381.9981.7479.5579.19
KoSimCSE-BERT83.3783.2283.5883.2483.6083.1583.5483.1383.49
KoSimCSE-RoBERTa83.6583.6083.7783.5483.7683.5583.7783.5583.64
KoSimCSE-BERT-multitask85.7185.2986.0285.6386.0185.5785.9785.2685.93
KoSimCSE-RoBERTa-multitask85.7785.0886.1285.8486.1285.8386.1285.0385.99
Capabilities & Tags
transformerspytorchsafetensorsrobertafeature-extractionkoreankotext-embeddings-inferenceendpoints_compatible
Links & Resources
Specifications
CategoryEmbedding
AccessAPI & Local
LicenseOpen Source
PricingOpen Source
Rating
1.8

Try KoSimCSE roberta multitask

Access the model directly