AI/EXPLORER
ToolsCategoriesSitesLLMsCompareAI QuizAlternativesPremium
—AI Tools
—Sites & Blogs
—LLMs & Models
—Categories
AI Explorer

Find and compare the best artificial intelligence tools for your projects.

Made within France

Explore

  • ›All tools
  • ›Sites & Blogs
  • ›LLMs & Models
  • ›Compare
  • ›Chatbots
  • ›AI Images
  • ›Code & Dev

Company

  • ›Premium
  • ›About
  • ›Contact
  • ›Blog

Legal

  • ›Legal notice
  • ›Privacy
  • ›Terms

© 2026 AI Explorer·All rights reserved.

HomeLLMsclip japanese base

clip japanese base

by line-corporation

Open source · 21k downloads · 29 likes

1.8
(29 reviews)EmbeddingAPI & Local
About

The *clip-japanese-base* model is a Japanese version of CLIP, specifically designed to understand and connect text and images in Japanese. It excels in tasks such as zero-shot image classification, searching for images based on text or vice versa, thanks to its ability to associate textual descriptions with visual content. Trained on a billion image-text pairs from the web, it offers a nuanced understanding of Japanese linguistic and cultural subtleties. Its use cases include visual content analysis, automatic moderation, and enhancing multimodal search engines. What sets it apart is its robustness on Japanese data, combined with a high-performance architecture tailored to the language's specificities.

Documentation

clip-japanese-base

This is a Japanese CLIP (Contrastive Language-Image Pre-training) model developed by LY Corporation. This model was trained on ~1B web-collected image-text pairs, and it is applicable to various visual tasks including zero-shot image classification, text-to-image or image-to-text retrieval.

How to use

  1. Install packages
Code
pip install pillow requests sentencepiece transformers torch timm
  1. Run
Python
import io
import requests
from PIL import Image
import torch
from transformers import AutoImageProcessor, AutoModel, AutoTokenizer

HF_MODEL_PATH = 'line-corporation/clip-japanese-base'
device = "cuda" if torch.cuda.is_available() else "cpu"
tokenizer = AutoTokenizer.from_pretrained(HF_MODEL_PATH, trust_remote_code=True)
processor = AutoImageProcessor.from_pretrained(HF_MODEL_PATH, trust_remote_code=True)
model = AutoModel.from_pretrained(HF_MODEL_PATH, trust_remote_code=True).to(device)

image = Image.open(io.BytesIO(requests.get('https://images.pexels.com/photos/2253275/pexels-photo-2253275.jpeg?auto=compress&cs=tinysrgb&dpr=3&h=750&w=1260').content))
image = processor(image, return_tensors="pt").to(device)
text = tokenizer(["犬", "猫", "象"]).to(device)

with torch.no_grad():
    image_features = model.get_image_features(**image)
    text_features = model.get_text_features(**text)
    text_probs = (100.0 * image_features @ text_features.T).softmax(dim=-1)

print("Label probs:", text_probs)
# [[1., 0., 0.]]

Model architecture

The model uses an Eva02-B Transformer architecture as the image encoder and a 12-layer BERT as the text encoder. The text encoder was initialized from rinna/japanese-clip-vit-b-16.

Evaluation

Dataset

  • STAIR Captions (v2014 val set of MSCOCO) for image-to-text (i2t) and text-to-image (t2i) retrieval. We measure performance using R@1, which is the average recall of i2t and t2i retrieval.
  • Recruit Datasets for image classification.
  • ImageNet-1K for image classification. We translated all classnames into Japanese. The classnames and templates can be found in ja-imagenet-1k-classnames.txt and ja-imagenet-1k-templates.txt.

Result

ModelImage Encoder ParamsText Encoder paramsSTAIR Captions (R@1)Recruit Datasets (acc@1)ImageNet-1K (acc@1)
Ours86M(Eva02-B)100M(BERT)0.300.890.58
Stable-ja-clip307M(ViT-L)100M(BERT)0.240.770.68
Rinna-ja-clip86M(ViT-B)100M(BERT)0.130.540.56
Laion-clip632M(ViT-H)561M(XLM-RoBERTa)0.300.830.58
Hakuhodo-ja-clip632M(ViT-H)100M(BERT)0.210.820.46

Licenses

The Apache License, Version 2.0

Citation

INI
@misc{clip-japanese-base,
    title = {CLIP Japanese Base},
    author={Shuhei Yokoo and Shuntaro Okada and Peifei Zhu and Shuhei Nishimura and Naoki Takayama}
    url = {https://huggingface.co/line-corporation/clip-japanese-base},
}
Capabilities & Tags
transformersonnxsafetensorsclypfeature-extractionclipjapanese-clipcustom_codeja
Links & Resources
Specifications
CategoryEmbedding
AccessAPI & Local
LicenseOpen Source
PricingOpen Source
Rating
1.8

Try clip japanese base

Access the model directly