AI/EXPLORER
ToolsCategoriesSitesLLMsCompareAI QuizAlternativesPremium
—AI Tools
—Sites & Blogs
—LLMs & Models
—Categories
AI Explorer

Find and compare the best artificial intelligence tools for your projects.

Made within France

Explore

  • ›All tools
  • ›Sites & Blogs
  • ›LLMs & Models
  • ›Compare
  • ›Chatbots
  • ›AI Images
  • ›Code & Dev

Company

  • ›Premium
  • ›About
  • ›Contact
  • ›Blog

Legal

  • ›Legal notice
  • ›Privacy
  • ›Terms

© 2026 AI Explorer·All rights reserved.

HomeLLMsmusicgen medium stereo onnx

musicgen medium stereo onnx

by chinedudave06

Open source · 257 downloads · 0 likes

0.0
(0 reviews)AudioAPI & Local
About

MusicGen Medium Stereo ONNX is an artificial intelligence model specialized in generating stereo music from text descriptions. It produces realistic musical pieces by leveraging an architecture optimized for efficient execution on mobile or embedded devices. With its FP16 precision and KV cache system, it strikes a good balance between audio quality and performance while reducing the model's size. This model is particularly well-suited for creative applications such as music production tools, composition assistants, or streaming platforms. Its ability to generate stereo tracks directly from text sets it apart from mono solutions or those less optimized for real-time use.

Documentation

MusicGen Medium Stereo — ONNX FP16 (KV-Cache)

ONNX export of facebook/musicgen-stereo-medium with KV-cache decoder in FP16 precision for efficient on-device stereo generation.

Model Details

PropertyValue
Base Modelfacebook/musicgen-stereo-medium
PrecisionFP16
AudioStereo (2 channels)
Codebooks8 (4 per channel)
Hidden Size1536
Sample Rate32 kHz
Max Length1500 steps (~30s)
Total Size~7.0 GB

Files

FileDescriptionSize
decoder_model.onnxStep-0 decoder proto1.7 MB
decoder_model.onnx.dataStep-0 FP16 weights3.5 GB
decoder_with_past_model.onnxKV-cache decoder proto1.4 MB
decoder_with_past_model.onnx.dataKV-cache FP16 weights3.1 GB
text_encoder.onnxT5 text encoder210 MB
encodec_decode.onnxEnCodec audio decoder57 MB
tokenizer.jsonT5 tokenizer vocabulary2.4 MB
config.jsonModel architecture config<1 KB
generation_config.jsonGeneration parameters<1 KB

Stereo + FP16 Notes

  • Stereo: Uses 8 codebooks (4 per audio channel). The EnCodec decoder handles channel splitting internally.
  • FP16: Decoder weights stored in FP16 via ONNX external data (.onnx.data), halving size with minimal quality loss.
  • Export fix: EnCodec quantizer's decode method was monkeypatched during export to handle the 4→8 codebook index mapping.

Usage

These models are designed for the DJNed Android app using ONNX Runtime.

Pipeline

  1. Text encoding: text_encoder.onnx encodes the text prompt
  2. Step 0: decoder_model.onnx + .data generates the first token + initial KV-cache
  3. Steps 1+: decoder_with_past_model.onnx + .data generates subsequent tokens
  4. Audio decode: encodec_decode.onnx converts 8 codebook streams to stereo audio

License

This model is derived from Meta's MusicGen under the CC-BY-NC-4.0 license.

Capabilities & Tags
onnxruntimeonnxmusicgenmusic-generationkv-cachetext-to-audiofp16stereoon-deviceandroid
Links & Resources
Specifications
CategoryAudio
AccessAPI & Local
LicenseOpen Source
PricingOpen Source
Rating
0.0

Try musicgen medium stereo onnx

Access the model directly