AI ExplorerAI Explorer
ToolsCategoriesSitesLLMsCompareAI QuizAlternativesPremium

—

AI Tools

—

Sites & Blogs

—

LLMs & Models

—

Categories

AI Explorer

Find and compare the best artificial intelligence tools for your projects.

Made within France

Explore

  • All tools
  • Sites & Blogs
  • LLMs & Models
  • Compare
  • Chatbots
  • AI Images
  • Code & Dev

Company

  • Premium
  • About
  • Contact
  • Blog

Legal

  • Legal notice
  • Privacy
  • Terms

© 2026 AI Explorer. All rights reserved.

HomeLLMsPowerMoE 3b

PowerMoE 3b

by ibm-research

Open source · 686k downloads · 18 likes

1.6
(18 reviews)ChatAPI & Local
About

PowerMoE 3b is an advanced language model based on a sparse Mixture-of-Experts (sMoE) architecture with 3 billion parameters. Through its selective activation mechanism, it efficiently mobilizes 800 million parameters per token, delivering performance comparable to dense models twice its size. Trained on a blend of open-source and proprietary data, it excels in diverse tasks such as natural language multiple-choice, code generation, and mathematical reasoning. The model stands out for its computational efficiency and ability to compete with heavier architectures while reducing inference costs. Its innovative approach makes it a powerful tool for applications requiring both precision and speed.

Documentation

Model Summary

PowerMoE-3B is a 3B sparse Mixture-of-Experts (sMoE) language model trained with the Power learning rate scheduler. It sparsely activates 800M parameters for each token. It is trained on a mix of open-source and proprietary datasets. PowerMoE-3B has shown promising results compared to other dense models with 2x activate parameters across various benchmarks, including natural language multi-choices, code generation, and math reasoning. Paper: https://arxiv.org/abs/2408.13359

Usage

Note: Requires installing HF transformers from source.

Generation

This is a simple example of how to use PowerMoE-3b model.

Python
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
device = "cuda" # or "cpu"
model_path = "ibm/PowerMoE-3b"
tokenizer = AutoTokenizer.from_pretrained(model_path)
# drop device_map if running on CPU
model = AutoModelForCausalLM.from_pretrained(model_path, device_map=device)
model.eval()
# change input text as desired
prompt = "Write a code to find the maximum value in a list of numbers."
# tokenize the text
input_tokens = tokenizer(prompt, return_tensors="pt")
# transfer tokenized inputs to the device
for i in input_tokens:
    input_tokens[i] = input_tokens[i].to(device)
# generate output tokens
output = model.generate(**input_tokens, max_new_tokens=100)
# decode output tokens into text
output = tokenizer.batch_decode(output)
# loop over the batch to print, in this example the batch size is 1
for i in output:
    print(i)
Capabilities & Tags
transformerssafetensorsgranitemoetext-generationmodel-index
Links & Resources
Specifications
CategoryChat
AccessAPI & Local
LicenseOpen Source
PricingOpen Source
Parameters3B parameters
Rating
1.6

Try PowerMoE 3b

Access the model directly