by chinedudave06
Open source · 257 downloads · 0 likes
MusicGen Medium Stereo ONNX is an artificial intelligence model specialized in generating stereo music from text descriptions. It produces realistic musical pieces by leveraging an architecture optimized for efficient execution on mobile or embedded devices. With its FP16 precision and KV cache system, it strikes a good balance between audio quality and performance while reducing the model's size. This model is particularly well-suited for creative applications such as music production tools, composition assistants, or streaming platforms. Its ability to generate stereo tracks directly from text sets it apart from mono solutions or those less optimized for real-time use.
ONNX export of facebook/musicgen-stereo-medium with KV-cache decoder in FP16 precision for efficient on-device stereo generation.
| Property | Value |
|---|---|
| Base Model | facebook/musicgen-stereo-medium |
| Precision | FP16 |
| Audio | Stereo (2 channels) |
| Codebooks | 8 (4 per channel) |
| Hidden Size | 1536 |
| Sample Rate | 32 kHz |
| Max Length | 1500 steps (~30s) |
| Total Size | ~7.0 GB |
| File | Description | Size |
|---|---|---|
decoder_model.onnx | Step-0 decoder proto | 1.7 MB |
decoder_model.onnx.data | Step-0 FP16 weights | 3.5 GB |
decoder_with_past_model.onnx | KV-cache decoder proto | 1.4 MB |
decoder_with_past_model.onnx.data | KV-cache FP16 weights | 3.1 GB |
text_encoder.onnx | T5 text encoder | 210 MB |
encodec_decode.onnx | EnCodec audio decoder | 57 MB |
tokenizer.json | T5 tokenizer vocabulary | 2.4 MB |
config.json | Model architecture config | <1 KB |
generation_config.json | Generation parameters | <1 KB |
.onnx.data), halving size with minimal quality loss.decode method was monkeypatched during export to handle the 4→8 codebook index mapping.These models are designed for the DJNed Android app using ONNX Runtime.
text_encoder.onnx encodes the text promptdecoder_model.onnx + .data generates the first token + initial KV-cachedecoder_with_past_model.onnx + .data generates subsequent tokensencodec_decode.onnx converts 8 codebook streams to stereo audioThis model is derived from Meta's MusicGen under the CC-BY-NC-4.0 license.