by slseanwu
Open source · 4k downloads · 29 likes
The MIDI LLM Llama 3.2 1B model is a specialized adaptation of the Llama 3.2 model, designed to generate music in MIDI format from textual descriptions. It expands the original vocabulary to include dedicated tokens for notes, durations, and instruments, enabling precise and nuanced musical composition. Through training on diverse musical data, it combines language understanding with the generation of structured musical sequences. Its primary use cases include automatic score generation, musical accompaniment, or exploring various styles based on textual instructions. What sets it apart is its ability to merge the strengths of large language models with fine-grained musical expertise, offering an innovative approach to AI-generated music.
Built on Llama 3.2 (1B) with an extended vocabulary for MIDI tokens.
meta-llama/Llama-3.2-1BLlamaForCausalLM with extended embedding layerClone our Github code repo, run through setup steps, and try:
git clone https://github.com/slSeanWU/MIDI-LLM
cd MIDI-LLM
python generate_transformers.py \
--model slseanwu/MIDI-LLM_Llama-3.2-1B \
--prompt "A cheerful rock song with bright electric guitars" \
--n_outputs 4
The repo and inference scripts provide a more complete usage guide.
The model extends Llama 3.2's vocabulary (128,256 tokens) with 55,030 MIDI tokens representing:
These tokens follow the vocabulary of Anticipatory Music Transformer (AMT) (Thickstun et al., TMLR 2024).
You are a world-class composer. Please compose some music according to the following description: [your input text]Recommended settings for best results:
temperature: 1.0
top_p: 0.98
max_tokens: 2046
This model checkpoint was evaluated with FAD and CLAP metrics on 896 LakhMIDI examples whose IDs can be found in our repo
If you find our model useful, please cite our research as
@inproceedings{wu2025midillm,
title={{MIDI-LLM}: Adapting large language models for text-to-{MIDI} music generation},
author={Wu, Shih-Lun and Kim, Yoon and Huang, Cheng-Zhi Anna},
booktitle={Proc. NeurIPS AI4Music Workshop},
year={2025}
}
This model is based on Llama 3.2 and is subject to the Llama 3.2 Community License.