by chinedudave06
Open source · 291 downloads · 0 likes
MusicGen Small ONNX is an artificial intelligence model specialized in generating music from text descriptions. It autonomously creates musical pieces by interpreting prompts such as "a soft jazz melody" or "an energetic electronic rhythm." Thanks to its optimized architecture with a KV-cache decoder, it delivers enhanced performance for real-time generation, particularly on mobile devices. This model stands out for its ability to produce coherent and high-quality musical sequences while remaining lightweight and suitable for embedded use. It is especially useful for content creators, music app developers, or artists looking to quickly explore sound ideas.
ONNX export of facebook/musicgen-small with KV-cache decoder for efficient on-device autoregressive generation.
| Property | Value |
|---|---|
| Base Model | facebook/musicgen-small |
| Precision | FP32 |
| Audio | Mono (1 channel) |
| Codebooks | 4 |
| Hidden Size | 1024 |
| Sample Rate | 32 kHz |
| Max Length | 1500 steps (~30s) |
| Total Size | ~3.6 GB |
| File | Description | Size |
|---|---|---|
decoder_model.onnx | Step-0 decoder (no KV-cache) | 1.6 GB |
decoder_with_past_model.onnx | Steps 1+ decoder (with KV-cache) | 1.4 GB |
text_encoder.onnx | T5 text encoder | 419 MB |
encodec_decode.onnx | EnCodec audio decoder | 113 MB |
tokenizer.json | T5 tokenizer vocabulary | 2.4 MB |
config.json | Model architecture config | <1 KB |
generation_config.json | Generation parameters | <1 KB |
These models are designed for the DJNed Android app using ONNX Runtime. The KV-cache decoder pair enables O(1) per-step generation instead of O(n).
text_encoder.onnx encodes the text promptdecoder_model.onnx generates the first token + initial KV-cachedecoder_with_past_model.onnx generates subsequent tokens using KV-cacheencodec_decode.onnx converts codebook tokens to audio waveformThis model is derived from Meta's MusicGen under the CC-BY-NC-4.0 license.