by chandar-lab
Open source · 19k downloads · 194 likes
NeoBERT is a next-generation English text encoding model, specifically designed to efficiently represent textual data. With an optimized architecture and training on the extensive RefinedWeb dataset, it outperforms existing models like BERT or RoBERTa while remaining compact with just 250 million parameters. Its key advantage lies in its ability to process sequences up to 4,096 tokens, providing a deeper understanding of context than its predecessors. The model stands out for its energy efficiency and exceptional performance on benchmarks such as MTEB, making it an ideal choice for applications requiring high-quality text representation. It seamlessly integrates as a plug-and-play alternative into existing pipelines, simplifying adoption for various natural language processing use cases.
NeoBERT is a next-generation encoder model for English text representation, pre-trained from scratch on the RefinedWeb dataset. NeoBERT integrates state-of-the-art advancements in architecture, modern data, and optimized pre-training methodologies. It is designed for seamless adoption: it serves as a plug-and-play replacement for existing base models, relies on an optimal depth-to-width ratio, and leverages an extended context length of 4,096 tokens. Despite its compact 250M parameter footprint, it is the most efficient model of its kind and achieves state-of-the-art results on the massive MTEB benchmark, outperforming BERT large, RoBERTa large, NomicBERT, and ModernBERT under identical fine-tuning conditions.
Ensure you have the following dependencies installed:
pip install transformers torch xformers==0.0.28.post3
If you would like to use sequence packing (un-padding), you will need to also install flash-attention:
pip install transformers torch xformers==0.0.28.post3 flash_attn
Load the model using Hugging Face Transformers:
from transformers import AutoModel, AutoTokenizer
model_name = "chandar-lab/NeoBERT"
tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
model = AutoModel.from_pretrained(model_name, trust_remote_code=True)
# Tokenize input text
text = "NeoBERT is the most efficient model of its kind!"
inputs = tokenizer(text, return_tensors="pt")
# Generate embeddings
outputs = model(**inputs)
embedding = outputs.last_hidden_state[:, 0, :]
print(embedding.shape)
| Feature | NeoBERT |
|---|---|
Depth-to-width | 28 × 768 |
Parameter count | 250M |
Activation | SwiGLU |
Positional embeddings | RoPE |
Normalization | Pre-RMSNorm |
Data Source | RefinedWeb |
Data Size | 2.8 TB |
Tokenizer | google/bert |
Context length | 4,096 |
MLM Masking Rate | 20% |
Optimizer | AdamW |
Scheduler | CosineDecay |
Training Tokens | 2.1 T |
Efficiency | FlashAttention |
Model weights and code repository are licensed under the permissive MIT license.
If you use this model in your research, please cite:
@misc{breton2025neobertnextgenerationbert,
title={NeoBERT: A Next-Generation BERT},
author={Lola Le Breton and Quentin Fournier and Mariam El Mezouar and Sarath Chandar},
year={2025},
eprint={2502.19587},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2502.19587},
}
For questions, do not hesitate to reach out and open an issue on here or on our GitHub.