NeoBERT

NeoBERT is a next-generation encoder model for English text representation, pre-trained from scratch on the RefinedWeb dataset. NeoBERT integrates state-of-the-art advancements in architecture, modern data, and optimized pre-training methodologies. It is designed for seamless adoption: it serves as a plug-and-play replacement for existing base models, relies on an optimal depth-to-width ratio, and leverages an extended context length of 4,096 tokens. Despite its compact 250M parameter footprint, it is the most efficient model of its kind and achieves state-of-the-art results on the massive MTEB benchmark, outperforming BERT large, RoBERTa large, NomicBERT, and ModernBERT under identical fine-tuning conditions.

Paper: paper
Repository: github.

Get started

Ensure you have the following dependencies installed:

Bash

pip install transformers torch xformers==0.0.28.post3

If you would like to use sequence packing (un-padding), you will need to also install flash-attention:

Bash

pip install transformers torch xformers==0.0.28.post3 flash_attn

How to use

Load the model using Hugging Face Transformers:

Python

from transformers import AutoModel, AutoTokenizer

model_name = "chandar-lab/NeoBERT"
tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
model = AutoModel.from_pretrained(model_name, trust_remote_code=True)

# Tokenize input text
text = "NeoBERT is the most efficient model of its kind!"
inputs = tokenizer(text, return_tensors="pt")

# Generate embeddings
outputs = model(**inputs)
embedding = outputs.last_hidden_state[:, 0, :]
print(embedding.shape)

Features

Feature	NeoBERT
`Depth-to-width`	28 × 768
`Parameter count`	250M
`Activation`	SwiGLU
`Positional embeddings`	RoPE
`Normalization`	Pre-RMSNorm
`Data Source`	RefinedWeb
`Data Size`	2.8 TB
`Tokenizer`	google/bert
`Context length`	4,096
`MLM Masking Rate`	20%
`Optimizer`	AdamW
`Scheduler`	CosineDecay
`Training Tokens`	2.1 T
`Efficiency`	FlashAttention

License

Model weights and code repository are licensed under the permissive MIT license.

Citation

If you use this model in your research, please cite:

Bibtex

@misc{breton2025neobertnextgenerationbert,
      title={NeoBERT: A Next-Generation BERT}, 
      author={Lola Le Breton and Quentin Fournier and Mariam El Mezouar and Sarath Chandar},
      year={2025},
      eprint={2502.19587},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2502.19587}, 
}

Contact

For questions, do not hesitate to reach out and open an issue on here or on our GitHub.

NeoBERT

Paper: paper
Repository: github.

Get started

Ensure you have the following dependencies installed:

Bash

pip install transformers torch xformers==0.0.28.post3

If you would like to use sequence packing (un-padding), you will need to also install flash-attention:

Bash

pip install transformers torch xformers==0.0.28.post3 flash_attn

How to use

Load the model using Hugging Face Transformers:

Python

from transformers import AutoModel, AutoTokenizer

model_name = "chandar-lab/NeoBERT"
tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
model = AutoModel.from_pretrained(model_name, trust_remote_code=True)

# Tokenize input text
text = "NeoBERT is the most efficient model of its kind!"
inputs = tokenizer(text, return_tensors="pt")

# Generate embeddings
outputs = model(**inputs)
embedding = outputs.last_hidden_state[:, 0, :]
print(embedding.shape)

Features

Feature	NeoBERT
`Depth-to-width`	28 × 768
`Parameter count`	250M
`Activation`	SwiGLU
`Positional embeddings`	RoPE
`Normalization`	Pre-RMSNorm
`Data Source`	RefinedWeb
`Data Size`	2.8 TB
`Tokenizer`	google/bert
`Context length`	4,096
`MLM Masking Rate`	20%
`Optimizer`	AdamW
`Scheduler`	CosineDecay
`Training Tokens`	2.1 T
`Efficiency`	FlashAttention

License

Model weights and code repository are licensed under the permissive MIT license.

Citation

If you use this model in your research, please cite:

Bibtex

@misc{breton2025neobertnextgenerationbert,
      title={NeoBERT: A Next-Generation BERT}, 
      author={Lola Le Breton and Quentin Fournier and Mariam El Mezouar and Sarath Chandar},
      year={2025},
      eprint={2502.19587},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2502.19587}, 
}

Contact

For questions, do not hesitate to reach out and open an issue on here or on our GitHub.