by mlx-community
Open source · 89k downloads · 19 likes
The Llama 3.2 1B Instruct 4-bit model is an optimized and lightweight version of the Llama 3.2 model, designed to run efficiently on local devices with limited resources. It excels in understanding and generating text, providing precise and contextually appropriate responses to natural language instructions. Its key capabilities include processing conversations, synthesizing information, and assisting with writing, all while maintaining quality comparable to larger models. Ideal for developers, researchers, or users seeking a high-performing AI without relying on the cloud, it stands out for its lightweight design and fast execution speed. This model is particularly well-suited for applications requiring smooth conversational interaction, such as virtual assistants or text automation tools.
The Model mlx-community/Llama-3.2-1B-Instruct-4bit was converted to MLX format from mlx-community/Llama-3.2-1B-Instruct-bf16 using mlx-lm version 0.21.5.
pip install mlx-lm
from mlx_lm import load, generate
model, tokenizer = load("mlx-community/Llama-3.2-1B-Instruct-4bit")
prompt = "hello"
if tokenizer.chat_template is not None:
messages = [{"role": "user", "content": prompt}]
prompt = tokenizer.apply_chat_template(
messages, add_generation_prompt=True
)
response = generate(model, tokenizer, prompt=prompt, verbose=True)