AI/EXPLORER
ToolsCategoriesSitesLLMsCompareAI QuizAlternativesPremium
—AI Tools
—Sites & Blogs
—LLMs & Models
—Categories
AI Explorer

Find and compare the best artificial intelligence tools for your projects.

Made within France

Explore

  • ›All tools
  • ›Sites & Blogs
  • ›LLMs & Models
  • ›Compare
  • ›Chatbots
  • ›AI Images
  • ›Code & Dev

Company

  • ›Premium
  • ›About
  • ›Contact
  • ›Blog

Legal

  • ›Legal notice
  • ›Privacy
  • ›Terms

© 2026 AI Explorer·All rights reserved.

HomeLLMsMIDI LLM Llama 3.2 1B

MIDI LLM Llama 3.2 1B

by slseanwu

Open source · 4k downloads · 29 likes

1.8
(29 reviews)AudioAPI & Local
About

The MIDI LLM Llama 3.2 1B model is a specialized adaptation of the Llama 3.2 model, designed to generate music in MIDI format from textual descriptions. It expands the original vocabulary to include dedicated tokens for notes, durations, and instruments, enabling precise and nuanced musical composition. Through training on diverse musical data, it combines language understanding with the generation of structured musical sequences. Its primary use cases include automatic score generation, musical accompaniment, or exploring various styles based on textual instructions. What sets it apart is its ability to merge the strengths of large language models with fine-grained musical expertise, offering an innovative approach to AI-generated music.

Documentation

MIDI-LLM

Built on Llama 3.2 (1B) with an extended vocabulary for MIDI tokens.

Research Paper

  • Shih-Lun Wu, Yoon Kim, and Cheng-Zhi Anna Huang.
    "MIDI-LLM: Adapting large language models for text-to-MIDI music generation."
    NeurIPS AI4Music Workshop, 2025.
    [Code] [Live Demo] [Paper] [Video]

Model Description

  • Base Model: meta-llama/Llama-3.2-1B
  • Model Size: 1.4B parameters
  • Extended Vocabulary: 183,286 tokens (128,256 for text + 55,030 for MIDI music)
  • Architecture: LlamaForCausalLM with extended embedding layer
  • Precision: BFloat16

Quick Start

Clone our Github code repo, run through setup steps, and try:

Bash
git clone https://github.com/slSeanWU/MIDI-LLM
cd MIDI-LLM

python generate_transformers.py \
    --model slseanwu/MIDI-LLM_Llama-3.2-1B \
    --prompt "A cheerful rock song with bright electric guitars" \
    --n_outputs 4

The repo and inference scripts provide a more complete usage guide.

Model Details

Extended Vocabulary

The model extends Llama 3.2's vocabulary (128,256 tokens) with 55,030 MIDI tokens representing:

  • Onset time (when notes occur)
  • Durations (how long each note is held)
  • Instrument-pitch pair (which note to play & by which instrument)

These tokens follow the vocabulary of Anticipatory Music Transformer (AMT) (Thickstun et al., TMLR 2024).

Training Data

  • Datasets:
    • Continued Pretraining (CPT)
      • music-related text from MusicPile (~1.7B tokens)
      • standalone MIDIs from GigaMIDI (~1.4B tokens after filtering out SFT examples)
    • Supervised Finetuning (SFT)
      • LakhMIDI music paired w/ MidiCaps text descriptions (~5B tokens with AMT infilling augmentation)
  • Training objective: Causal language modeling
  • Training sequence length: 2,048
  • System prompt: You are a world-class composer. Please compose some music according to the following description: [your input text]

Inference Hyperparameters

Recommended settings for best results:

YAML
temperature: 1.0
top_p: 0.98
max_tokens: 2046

Evaluation

This model checkpoint was evaluated with FAD and CLAP metrics on 896 LakhMIDI examples whose IDs can be found in our repo

  • https://github.com/slSeanWU/MIDI-LLM/blob/main/assets/evaluation_set_lakh_ids.txt | Model | Params | Precision | FAD ↓ | CLAP ↑ | |-------|--------|-----------|-------|--------| | MIDI-LLM | 1.47B | BF16 | 0.173 | 22.1 | | MIDI-LLM | 1.47B | FP8 | 0.216 | 21.8 |

Citation

If you find our model useful, please cite our research as

Bibtex
@inproceedings{wu2025midillm,
  title={{MIDI-LLM}: Adapting large language models for text-to-{MIDI} music generation},
  author={Wu, Shih-Lun and Kim, Yoon and Huang, Cheng-Zhi Anna},
  booktitle={Proc. NeurIPS AI4Music Workshop},
  year={2025}
}

License

This model is based on Llama 3.2 and is subject to the Llama 3.2 Community License.

Capabilities & Tags
transformerssafetensorsllamatext-generationmusicmiditext-to-musictext-to-miditext-to-audioen
Links & Resources
Specifications
CategoryAudio
AccessAPI & Local
LicenseOpen Source
PricingOpen Source
Parameters1B parameters
Rating
1.8

Try MIDI LLM Llama 3.2 1B

Access the model directly