codebert base

Name: codebert base
Rating: 3.1 (284 reviews)

by microsoft

Open source · 296k downloads · 284 likes

3.1

(284 reviews)EmbeddingAPI & Local

About

CodeBERT-base is a pre-trained language model specifically designed to understand and generate both computer code and natural language text. It excels in tasks such as code search, generating documentation from code, and code completion by leveraging a joint understanding of both types of data. Its unique approach, combining masked language modeling and discriminative objectives for real tokens, enables it to capture complex relationships between programming languages and textual descriptions. The model stands out for its versatility, capable of processing multiple programming languages while maintaining high performance. It is particularly useful for developers and researchers looking to automate code-related tasks or enhance programming assistance tools.

Documentation

CodeBERT-base

Pretrained weights for CodeBERT: A Pre-Trained Model for Programming and Natural Languages.

Training Data

The model is trained on bi-modal data (documents & code) of CodeSearchNet

Training Objective

This model is initialized with Roberta-base and trained with MLM+RTD objective (cf. the paper).

Usage

Please see the official repository for scripts that support "code search" and "code-to-document generation".

Reference

CodeBERT trained with Masked LM objective (suitable for code completion)
🤗 Hugging Face's CodeBERTa (small size, 6 layers)

Citation

Bibtex

@misc{feng2020codebert,
    title={CodeBERT: A Pre-Trained Model for Programming and Natural Languages},
    author={Zhangyin Feng and Daya Guo and Duyu Tang and Nan Duan and Xiaocheng Feng and Ming Gong and Linjun Shou and Bing Qin and Ting Liu and Daxin Jiang and Ming Zhou},
    year={2020},
    eprint={2002.08155},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}

Capabilities & Tags

transformerspytorchtfjaxrustrobertafeature-extractionendpoints_compatible

Links & Resources