MusicGen Medium — ONNX FP16 (KV-Cache)

ONNX export of facebook/musicgen-medium with KV-cache decoder in FP16 precision for efficient on-device generation.

Model Details

Property	Value
Base Model	`facebook/musicgen-medium`
Precision	FP16
Audio	Mono (1 channel)
Codebooks	4
Hidden Size	1536
Sample Rate	32 kHz
Max Length	1500 steps (~30s)
Total Size	~6.8 GB

Files

File	Description	Size
`decoder_model.onnx`	Step-0 decoder proto	1.7 MB
`decoder_model.onnx.data`	Step-0 FP16 weights	3.5 GB
`decoder_with_past_model.onnx`	KV-cache decoder proto	1.4 MB
`decoder_with_past_model.onnx.data`	KV-cache FP16 weights	3.1 GB
`text_encoder.onnx`	T5 text encoder	210 MB
`encodec_decode.onnx`	EnCodec audio decoder	57 MB
`tokenizer.json`	T5 tokenizer vocabulary	2.4 MB
`config.json`	Model architecture config	<1 KB
`generation_config.json`	Generation parameters	<1 KB

FP16 Notes

The decoder weights are stored in FP16 precision using ONNX external data format (.onnx.data files). This halves the model size compared to FP32 (~7 GB → ~3.5 GB per decoder) with minimal quality loss. The text encoder and EnCodec remain in FP32.

Usage

These models are designed for the DJNed Android app using ONNX Runtime.

Pipeline

Text encoding: text_encoder.onnx encodes the text prompt
Step 0: decoder_model.onnx + .data generates the first token + initial KV-cache
Steps 1+: decoder_with_past_model.onnx + .data generates subsequent tokens
Audio decode: encodec_decode.onnx converts codebook tokens to audio waveform

License

This model is derived from Meta's MusicGen under the CC-BY-NC-4.0 license.

MusicGen Medium — ONNX FP16 (KV-Cache)

ONNX export of facebook/musicgen-medium with KV-cache decoder in FP16 precision for efficient on-device generation.

Model Details

Property	Value
Base Model	`facebook/musicgen-medium`
Precision	FP16
Audio	Mono (1 channel)
Codebooks	4
Hidden Size	1536
Sample Rate	32 kHz
Max Length	1500 steps (~30s)
Total Size	~6.8 GB

Files

File	Description	Size
`decoder_model.onnx`	Step-0 decoder proto	1.7 MB
`decoder_model.onnx.data`	Step-0 FP16 weights	3.5 GB
`decoder_with_past_model.onnx`	KV-cache decoder proto	1.4 MB
`decoder_with_past_model.onnx.data`	KV-cache FP16 weights	3.1 GB
`text_encoder.onnx`	T5 text encoder	210 MB
`encodec_decode.onnx`	EnCodec audio decoder	57 MB
`tokenizer.json`	T5 tokenizer vocabulary	2.4 MB
`config.json`	Model architecture config	<1 KB
`generation_config.json`	Generation parameters	<1 KB

FP16 Notes

Usage

These models are designed for the DJNed Android app using ONNX Runtime.

Pipeline

Text encoding: text_encoder.onnx encodes the text prompt

Step 0: decoder_model.onnx + .data generates the first token + initial KV-cache

Steps 1+: decoder_with_past_model.onnx + .data generates subsequent tokens

Audio decode: encodec_decode.onnx converts codebook tokens to audio waveform

License

This model is derived from Meta's MusicGen under the CC-BY-NC-4.0 license.

musicgen medium onnx