Llama 3.2 1B Instruct 4bit

Name: Llama 3.2 1B Instruct 4bit
Rating: 1.6 (19 reviews)

by mlx-community

Open source · 89k downloads · 19 likes

1.6

(19 reviews)ChatAPI & Local

About

The Llama 3.2 1B Instruct 4-bit model is an optimized and lightweight version of the Llama 3.2 model, designed to run efficiently on local devices with limited resources. It excels in understanding and generating text, providing precise and contextually appropriate responses to natural language instructions. Its key capabilities include processing conversations, synthesizing information, and assisting with writing, all while maintaining quality comparable to larger models. Ideal for developers, researchers, or users seeking a high-performing AI without relying on the cloud, it stands out for its lightweight design and fast execution speed. This model is particularly well-suited for applications requiring smooth conversational interaction, such as virtual assistants or text automation tools.

Documentation

mlx-community/Llama-3.2-1B-Instruct-4bit

The Model mlx-community/Llama-3.2-1B-Instruct-4bit was converted to MLX format from mlx-community/Llama-3.2-1B-Instruct-bf16 using mlx-lm version 0.21.5.

Use with mlx

Bash

pip install mlx-lm

Python

from mlx_lm import load, generate

model, tokenizer = load("mlx-community/Llama-3.2-1B-Instruct-4bit")

prompt = "hello"

if tokenizer.chat_template is not None:
    messages = [{"role": "user", "content": prompt}]
    prompt = tokenizer.apply_chat_template(
        messages, add_generation_prompt=True
    )

response = generate(model, tokenizer, prompt=prompt, verbose=True)

Capabilities & Tags

transformerssafetensorsllamatext-generationfacebookmetapytorchllama-3mlxconversational

Links & Resources