by mlx-community
Open source · 598k downloads · 51 likes
This model, gpt-oss-20b MXFP4 Q8, is an optimized and converted version of the openai/gpt-oss-20b model, specifically adapted to run with the MLX library. It excels in text generation, contextual understanding, and natural language processing tasks, delivering high performance while remaining accessible on a variety of hardware configurations. Its primary use cases include assisted writing, text data analysis, content creation, and conversational applications. What sets it apart is its quantized format (MXFP4 Q8), which enables efficient execution with reduced memory consumption without compromising response quality. Ideal for developers and researchers seeking a balance between power and accessibility.
This model mlx-community/gpt-oss-20b-MXFP4-Q8 was converted to MLX format from openai/gpt-oss-20b using mlx-lm version 0.27.0.
pip install mlx-lm
from mlx_lm import load, generate
model, tokenizer = load("mlx-community/gpt-oss-20b-MXFP4-Q8")
prompt = "hello"
if tokenizer.chat_template is not None:
messages = [{"role": "user", "content": prompt}]
prompt = tokenizer.apply_chat_template(
messages, add_generation_prompt=True
)
response = generate(model, tokenizer, prompt=prompt, verbose=True)