MusicGen Small Stereo — ONNX (KV-Cache)

ONNX export of facebook/musicgen-stereo-small with KV-cache decoder for efficient on-device autoregressive generation.

Model Details

Property	Value
Base Model	`facebook/musicgen-stereo-small`
Precision	FP32
Audio	Stereo (2 channels)
Codebooks	8 (4 per channel)
Hidden Size	1024
Sample Rate	32 kHz
Max Length	1500 steps (~30s)
Total Size	~3.7 GB

Files

File	Description	Size
`decoder_model.onnx`	Step-0 decoder (no KV-cache)	1.7 GB
`decoder_with_past_model.onnx`	Steps 1+ decoder (with KV-cache)	1.5 GB
`text_encoder.onnx`	T5 text encoder	419 MB
`encodec_decode.onnx`	EnCodec audio decoder	113 MB
`tokenizer.json`	T5 tokenizer vocabulary	2.4 MB
`config.json`	Model architecture config	<1 KB
`generation_config.json`	Generation parameters	<1 KB

Stereo Export Notes

The stereo model uses 8 codebooks (4 per audio channel). During export, the EnCodec quantizer's decode method was monkeypatched to handle the codebook index mismatch (EnCodec has 4 physical layers, but stereo needs 8 codebook indices). The exported EnCodec ONNX is replaced with the mono version, which handles both mono and stereo decoding.

Usage

These models are designed for the DJNed Android app using ONNX Runtime.

Pipeline

Text encoding: text_encoder.onnx encodes the text prompt
Step 0: decoder_model.onnx generates the first token + initial KV-cache
Steps 1+: decoder_with_past_model.onnx generates subsequent tokens using KV-cache
Audio decode: encodec_decode.onnx converts 8 codebook streams (4 per channel) to stereo audio

License

This model is derived from Meta's MusicGen under the CC-BY-NC-4.0 license.

MusicGen Small Stereo — ONNX (KV-Cache)

ONNX export of facebook/musicgen-stereo-small with KV-cache decoder for efficient on-device autoregressive generation.

Model Details

Property	Value
Base Model	`facebook/musicgen-stereo-small`
Precision	FP32
Audio	Stereo (2 channels)
Codebooks	8 (4 per channel)
Hidden Size	1024
Sample Rate	32 kHz
Max Length	1500 steps (~30s)
Total Size	~3.7 GB

Files

File	Description	Size
`decoder_model.onnx`	Step-0 decoder (no KV-cache)	1.7 GB
`decoder_with_past_model.onnx`	Steps 1+ decoder (with KV-cache)	1.5 GB
`text_encoder.onnx`	T5 text encoder	419 MB
`encodec_decode.onnx`	EnCodec audio decoder	113 MB
`tokenizer.json`	T5 tokenizer vocabulary	2.4 MB
`config.json`	Model architecture config	<1 KB
`generation_config.json`	Generation parameters	<1 KB

Stereo Export Notes

Usage

These models are designed for the DJNed Android app using ONNX Runtime.

Pipeline

Text encoding: text_encoder.onnx encodes the text prompt

Step 0: decoder_model.onnx generates the first token + initial KV-cache

Steps 1+: decoder_with_past_model.onnx generates subsequent tokens using KV-cache

Audio decode: encodec_decode.onnx converts 8 codebook streams (4 per channel) to stereo audio

License

This model is derived from Meta's MusicGen under the CC-BY-NC-4.0 license.

musicgen small stereo onnx