by chinedudave06
Open source · 273 downloads · 0 likes
MusicGen Medium ONNX is an optimized version of the MusicGen music generation model, specifically designed to run efficiently on mobile or embedded devices. It enables the creation of music from text descriptions in just a few seconds, generating coherent melodies tailored to the provided prompts. Thanks to its optimized architecture with KV caching and FP16 precision, it significantly reduces the model's size while maintaining high sound quality, making it ideal for mobile applications like the DJNed app. This model stands out for its ability to produce a wide range of musical pieces, from classical styles to more experimental creations, while remaining accessible on devices with limited resources. Its use is particularly well-suited for content creators, amateur musicians, or developers looking to integrate fast and intuitive music generation into their projects.
ONNX export of facebook/musicgen-medium with KV-cache decoder in FP16 precision for efficient on-device generation.
| Property | Value |
|---|---|
| Base Model | facebook/musicgen-medium |
| Precision | FP16 |
| Audio | Mono (1 channel) |
| Codebooks | 4 |
| Hidden Size | 1536 |
| Sample Rate | 32 kHz |
| Max Length | 1500 steps (~30s) |
| Total Size | ~6.8 GB |
| File | Description | Size |
|---|---|---|
decoder_model.onnx | Step-0 decoder proto | 1.7 MB |
decoder_model.onnx.data | Step-0 FP16 weights | 3.5 GB |
decoder_with_past_model.onnx | KV-cache decoder proto | 1.4 MB |
decoder_with_past_model.onnx.data | KV-cache FP16 weights | 3.1 GB |
text_encoder.onnx | T5 text encoder | 210 MB |
encodec_decode.onnx | EnCodec audio decoder | 57 MB |
tokenizer.json | T5 tokenizer vocabulary | 2.4 MB |
config.json | Model architecture config | <1 KB |
generation_config.json | Generation parameters | <1 KB |
The decoder weights are stored in FP16 precision using ONNX external data format (.onnx.data files). This halves the model size compared to FP32 (~7 GB → ~3.5 GB per decoder) with minimal quality loss. The text encoder and EnCodec remain in FP32.
These models are designed for the DJNed Android app using ONNX Runtime.
text_encoder.onnx encodes the text promptdecoder_model.onnx + .data generates the first token + initial KV-cachedecoder_with_past_model.onnx + .data generates subsequent tokensencodec_decode.onnx converts codebook tokens to audio waveformThis model is derived from Meta's MusicGen under the CC-BY-NC-4.0 license.